
LTURAL 





BIOMETRIKA 

A JOURNAL FOR THE STATISTICAL STUDY OF 
BIOLOGICAL PROBLEMS 

* 

i 

FOUNDED BY 

W. F. R. WELDON, FRANCIS GALTON and KARL PEARSON 

EDITED BY 

KARL PEARSON 

ASSISTED BY 

EGON SHARPE PEARSON 

VOLUME XXIV 

1932 


ISSUED BY THE BIOMETRIC LABORATORY 
UNIVERSITY COLLEGE, LONDON 
AND PRINTED AT THE 
UNIVERSITY PRESS, CAMBRIDGE 



PRINTED IN GREAT BRITAIN 



CONTENTS OF VOLUME XXIV 

Memoirs 

I. Hereditary Entropion and Hereditary Changes in the Skin of the 

Eyelids. By C. H. Usher. With fourteen figures in the text, one 
folding pedigree and four plates. 

II. The Geometric Properties of Microscopic Configurations. I. General 

Aspects of Projectometry. By William R. Thompson 

III. The Geometric Properties of Microscopic Configurations. H. Inci¬ 

dence and Volume of Islands of Langerhans in the Pancreas of a 
Monkey. By William R. Thompson, Raymond Hussey and 
others. With one figure in the text. 

IV. A Bessel Function Distribution. By A. T McKay. With one 

diagram in the text. 

/ V. On the Normality or Want of Normality in the Frequency Distri¬ 

butions of Cranial Measurements. By E. M. Elderton and 
T. L. Woo. With threo diagrams in the text .... 

VI. The Sampling Distribution of the Third Moment Coefficient—an 

Experiment. By Joseph Pepper. With three figures in the text 

VII. Further Contributions to the Sampling Problem. By N. St 

Georg escu. 

VIII. A Preliminary Classification of Asiatic Races based on Cranial 

Measurements. By T. L. Woo and G. M. Morant. With one 
map, two diagrams in text and two folding sheets of tables 

IX. A Study of the Crania in the Vaulted Ambulatory of St Leonard’s 

Church, Hythe. By Brenda N. Stoessiger and G. M. Morant. 
With one map, one plan of Hythe, and six contours in text. Six 
tissue contours in pocket, two folding sheets of measurements 
and seven plates. 

X. On the Mean Character and Variance of a Ranked Individual, and 

on the Mean and Variance of the Intervals between Ranked 
Individuals. By Karl Pearson, with the assistance of Margaret 
V, Pearson. Part II. The Case of Certain Skew Curves. With 
three diagrams and eleven figures in the text 


PAGE 

1 

21 

27 

39 

45 

55 

65 

108 

135 

203 




iv 


Contents 


XI. Further Applications in Statistics of the T m (x) Bessel Function. 

By Karl Pearson, S. A. Stouffer, and F. N. David. With 
three figures in the text. 

XII. Experimental Discussion of the (^ 2 , P ) Test for Goodness of Fit. 

By Karl Pearson. 

XIII. On the Distribution of the Correlation Coefficient in Small Samples. 

By Paul R. Rider. With six figures in the text 

XIV. The Percentage Limits for the Distribution of Range in Samples 

from a Normal Population. By Egon S. Pearson. With one 
figure in the text. 

XV. The Converse of Spearman’s Two-factor Theorem. By Burton H. 

Cami*. 

XVI. The Distribution of the Index in a Normal Bivariate Population. 

By E C. FlELLER. With two figures in the text 

'XVII. A Note on the Distribution of the Correlation Ratio. By John 
W lSHAllT. 

XVIII. On the Probability that Two Independent Distributions of Frequency 
are really Samples from the Same Parent Population. By Karl 
Pearson . 

XIX. Certain Generalizations in the Analysis of Variance. By S. S. 

Wilks. 


Miscellanea 

(i) A Comparison of the Accuracy of Two Types of Quadrature 

Formulae. By E. S. Martin ...... 

(ii) A Simple Non-Normal Correlation Surface. By H. L. RlETZ . 

(iii) Note on Professor Rietz’s Problem. Editorial 

(iv) Note on a Memoir by A. E. R. Church, Bi c TT ol. xvm. 

By A. E. R. Church. 

(v) On a Method of Proceeding from partial Cell Frequencies to 

Ordinates and to total Cell Frequencies in the case of a 
bivariate Frequency Surface. By Jacques Chapelin . 

(vi) On the Betas of Quadrilateral Distributions. By Owen L. 

Davies. With three figures in the text . 


PAGE 

293 

351 

382 

404 

418 

428 

441 

457 

471 

280 

288 

290 

292 

495 

498 







Contents 


v 


List oj Plates , etc . 

Biometrika Portrait Series, No. IX. A. A. Markoff, 1856—1922 Frontispiece 

(a) 0 . H. Usher: Hereditary Entropion and Hereditary Changes 
in the Skin of the Eyelids. 

Folding Pedigree. to face page 10 

Plates I—IV.. „ 20 

(/ 8 ) T. L . Woo and Q . M. Morant: A Preliminary Classification 
of Asiatic Races based on Cranial Measurements. 

Folding Table of Coefficients of Racial Likeness between 

Oriental Races . ..„ „ 122 

Folding Table of Coefficients of Racial Likeness between 

Northern Mongolian and Oriental Races 126 

( 7 ) B. N '. Stoessiger and G. M. Morant: A Study of the Crania 


in the Vaulted Ambulatory of Saiut Leonard’s Church, 


Hythe. 
Plate I. 

Engravings of the Interior of the Ambulatory 
Passage of Saint Leonard’s Church, Hythe 

ft 

it 

202 

Plate II. 

Typical Hythe Skulls. Norma lateralis 

»> 

ft 

it 

Plate III. 

„ „ „ Norma facialis 

ft 

t) 

tt 

Plate IV. 

„ „ „ Norma verticalis 

tt 

ft 

tt 

Plate V. 

Abnormal Hythe Skulls 

tt 

tt 

tt 

Plate VI. 

Basal Anomalies of the Hythe Skulls . 

>t 

tt 

tt 

Plate VII. 

Anomalous Hythe Skulls. 

tt 

tt 

it 

Appendix 

of Individual Measurements in three tables on 




two folding sheets. 

tt 

tt 

it 


Six Cranial Contours on tissue in pocket at end of volume. 





A. A. Markoff, 1856 -1922. 


“ liiomdrihi" Portrait Meries, No. IX. Issued an fronti«[>ioci‘ to Vo], XXIV. 


Volume XXIV 


MAY, 1932 


Parts I and II 


BIOMETRIKA 

HEREDITARY ENTROPION AND HEREDITARY CHANGES 
IN THE SKIN OF THE EYELIDS. 

By C. H. USHER, M.B., B.Ch. 

The entropion in this pedigree is of the spasmodic variety and appears at a 
much earlier age than the usual spasmodic entropion of elderly people. A few of 
the cases are accompanied by obvious changes in the skin of the lids, but these 
changes in the skin occur more often without entropion. 

Before describing these cases a short note will be given, more particularly on the 
clinical aspect of cases in the literature that appear to have a bearing on the subject 
of these changes in the skin of the lids, and some notes on congenital entropion. 

Changes in the Skin of the Lids. Graf, in 1836, under the title “Local hereditary 
relaxation of the skin,” described a case in which both lower lids were like sacks, with 
skin thinned, in longitudinal and transverse folds; the left side of the neck was 
also affected. Sichel, in 1844, describes ‘ptosis atonique’: “ The skin of the upper lids 
is flask shaped, wrinkled, sometimes even folded transversely, it may sometimes 
hang in front of the tarsus in form of a transverse fold which descends to its free 
edge, loss of elasticity is noted, the head is carried erect to enable the patient to see.” 
Mackenzie (1854) says, “It is occasionally the case, that when the fold of integument 
is very considerable it presses by its weight, the edge of the lid, along with the cilia, 
inwards, so as to produce a degree of entropium.” Arlt (1874) stated that ptosis 
adiposa is seen in young people in whom the fold of the skin on the upper eyelids is so 
enlarged especially at the outer angle that it lies over the lashes; the skin is red, 
but not inflamed. If a fold of this thin and flexible skin is incised, a soft yellow 
elastic fat is found immediately beneath, which is indistinguishable from orbital 
fat tissue, and is so pressed forwards that a considerable portion of it requires to be 
removed with scissors in order to close the skin wound satisfactorily. Fuchs (1896) 
described a group of cases as blepharochalasis. The disease affects exclusively the 
upper lids. The clinical signs are these: Both upper lids are affected, skin is very 
thin and has lost elasticity, in consequence fine folds lie in all directions, as is seen 
in extensive senile atrophy and relaxation of the skin. The skin resembles crinkled 
cigarette paper. Increase in surface of skin, peculiar reddening of lids, numerous 
small distended veins in the skin like those seen in old people with red cheeks. 
Skin changes are most marked between brow and upper edge of tarsus. Relaxation of 
the subcutaneous cellular tissue. As a result the skin hangs down in the form of an 
unpleasant relaxed and reddened sack over the lid margin and may suggest ptosis. 
Occurs in male and female, in youth and middle age. The patients call the condition 

Biometrika xxxv 1 



2 Hereditary Entropion and Changes in the Skin of Eyelids 

a swelling. In most cases it could not be determined whether the change in the 
lids had been preceded by oedematous swelling or not. Fuchs mentions that all 
these cases of drooping of the skin over the lid margin had been given the common 
name epiblepharon (v. Ammon). Since this paper was published a number of cases 
have been described bearing on the. subject of blepharochalasis. Fehr (1898), in his 
case of “Lidhaut-Erschlaffung sog. Blepharochalasis,” makes no mention of wrinkling 
or of veins in the skin of the lids. Schmidt-Rimpler (1899), in his paper on “Fett- 
Hernienderoberen Augenliden,” describes aprotrusionof fat through the musculature 
of the lid found at operation. A similar case was reported by Meguro in 1930. 
Rohmer (1900), in an article < ‘De^angio-m6galiesy^n6triquedespaupieressup6rieures, ,, 
describes four cases confined to the upper lids of women. Terson (1904) describes 
three cases under the heading “ Dermatolysie palp6brale.” In the first case the upper 
lids were swollen and hyperaemic. At times the thin skin hung as a sack on the lashes. 
In the second case the upper lids were swollen and red. In the third case the skin 
of the left upper lid showed since birth expansion and an unsightly droop. Scrini's 
case (1906) had the upper lids swollen from oedema during migrainous attacks. 
The orbital lachrymal gland was felt through the outer part of the orbito-palpebral 
furrow. The free edge of the lid in two-thirds of its extent was covered by a cutaneous 
fold formed bv flabby skin, thin as honeycomb, and of pale colour. With cold it 
became red, some venules seen especially in the left one. Lafon and Villemont 
(1906) describe a case on the upper lid showing a continuous fold parallel to the ciliary 
margin; at the junction of inner and middle third of the lid this fold descended 
obliquely downwards and outwards, forming an apron which hid the ciliary margin and 
the upper part of the outer canthus; the pupil was partly hidden. A hard, painless, 
mobile body was felt in the fold, an ectopic orbital lachrymal gland. No wrinkling 
of the skin is mentioned. No oedema or congestion of the lids had been noticed 
until a recent attack in the left. Watering occurred readily from wind and cold. 
The title of the paper is “ Blepharochalasis H6n$ditaire avec Dacry-addnoptose.” 
Bach (1906) believes that the clinical picture of blepharochalasis may be produced 
by a collection of fat in the upper lids. Wagenmann (1907) reported a case of ptosis 
adiposa in a youth of 16. Weinstein (1909) records a typical case of ptosis adiposa 
that is, he says, a case of ptosis atonica complicated by the occurrence of fat under 
the skin. Ptosis atonica and ptosis adiposa appear as two different forms of the 
same disease. He would prefer the designation ptosis atrophica for ptosis atonica. 
Loeser (1908) found luxation of the lachrymal gland in a case of blepharochalasis. 
Stieren (1914) reported two cases of blepharochalasis. In the first subcutaneous 
masses due to fat were present and could be reduced into the orbit, when the 
appearances were those of wrinkled skin as described by Fuchs. In the second case 
there were several attacks of swelling of lids, the skin was wrinkled and thin, no masses 
were felt 'in the eyelids, but a rounded elongated mass was felt under the upper 
margin of orbit. In Jenison’s case of blepharochalasis (1915) swelling was not inter¬ 
mittent. Skin of upper lids was thin, shiny, pink and there was definite atrophy. 
Dilated veins very evident, lachrymal glands not palpable, no excess of fat in upper 
lids. Wassermann test; given Hg. bichloride and potas. iodide. Two days later pain 



C. H. Usher 


3 


in lids followed by swelling. On 10th day swelling had disappeared but skin hung 
lax and flaccid. HeckeTs (1921) case designated blepharochalasis with ptosis was 
unilateral. Verhoeff and Friedenwald (1922), in a paper on blepharochalasis, describe 
the case of a female with upper lids swollen, skin finely wrinkled, loose and redun¬ 
dant like that of old age when fat tissue has disappeared, but in contrast to this 
lids were puffy. No nodules present. Friedenwald (1923) describes the case of a 
male with both upper lids baggy, twice as much skin surface as is normal, skin soft 
and pliable, in no way suggesting oedema, with many fine wrinkles and deficiency 
of subcutaneous connective tissue. The excess skin formed a fold which extended a 
little below ciliary margin and covered upper half of pupillary areas. Cilia of outer 
one-third of upper lids were turned in and constantly rubbed against the cornea. 
Weidler’s first case(1913) had bagginess and drooping of skin and subcutaneous tissue 
over edge of lid margin. No wrinkling or folding, skin smooth. In second case the skin, 
pinkish red, hung down in a baggy pouch-like mass partially covering the eyes, 
superficial veins prominent, no wrinkling mentioned, lachrymal gland could be rolled 
under the finger. Benedict (1926) describes three cases of blepharochalasis. 

Attention has been drawn by a few writers in Germany and Holland to the 
association between the occurrence of double lip* and changes in the skin of the 
lids. Ascher (1920) describes several cases with blepharochalasis, goitre and double 
lip. Wirths (1920), in a paper entitled “Beiderseitige Lidgeschwulst, kombiniert mit 
Geschwulstbildung der Oberlippe,” gives the case of a male, age 23, with a history 
of changes in both lids and upper lip since birth. Weve (1921) described the case 
of a soldier, age 27, with typical blepharochalasis and double lip. Eigels case 
(1925) was that of a man who according to his mother developed double lip and 
blepharochalasis at the age of 10. 

The association of double lip with blepharochalasis has no immediate concern 
with the subject of this paper, for no double lip has been found in any member of 
the pedigree. But as these cases are not well known, notes of two examples, shown 
to me recently by modical friends, may not be out of place here. 

Case 1. (Plate I, Fig. 1.) D. Mc.G., male, age 74, seen 10th May, 1931. On the 
upper lids are large folds of skin. These cover the whole of the lid margin so that no 
lashes of the upper lid are seen on the left side and only the ends of a few on the 
right side. The edge of the fold covers the upper part of the pupil, so that when 
talking he holds his head well back to enable him to see better. The skin of the 

* The term “double lip” is applied to a condition occasionally met with in young men, in which 
there is an hypertrophy of the labial glands in the mucous membrane of the upper lip. It is of slow 
growth, and forms an elongated swelling on each side of the frenum, covering the teeth and projecting 
the lip. It is shotty to the feel, and the only oomplaint is of disfigurement. The treatment consists in 
excising the redundant fold of mucous membrane, including the enlarged mucous glands. Manual of 
Surgery , by Alexis Thomson and Alexander Miles, Yol. 2, 5th edition, p. 165, 1915. 

Neustatter distinguished two kinds of double lip in adults. Bee Ascher’s first paper. In a treatise on 
surgery by George Byerson Fowler, Vol. 1, p. 477, 1906, occur the following words: “Congenital 
hyperplasia of the labial substance is sometimes observed. The thickening may be due to an excessive 
thickening of lymph-vessels (lymph-angioma) or the hyperplastic condition may refer more to the 
mucous membrane, becoming visible as a * double lip’ during the act of laughing.” 


1—2 



4 Hereditary Entropion and Changes in the Skin of Eyelids 

fold is very thin and has numerous fine wrinkles placed in all directions. There is 
marked loss of elasticity of the skin, which shows numerous blue veins. Lower lids 
have wrinkled skin, but otherwise are normal. There is a horizontal, smooth, soft 
ridge behind his upper lip. This has the appearance and consistence of mucous 
membrane. It obscures his upper gums with his few remaining teeth. His thyroid 
gland is not enlarged. He dates the lid condition from the age of seven or eight, and at 
the age of 19 the unusual appearance was commented on by the late Sir William 
McGregor. 

Case 2. (Plate II, Fig. 2.) Mr G.MacD., age 64, when looking horizontally forwards, 
nearly the whole of the margin of the left upper lid is covered by a fold of skin and 
more than half of the right upper lid margin is similarly covered. There is moderate 
wrinkling of the skin fold, no veins are visible in the skin. He noted the folds for 
the first time two years ago. Marked double lip which is soft and red like mucous 
membrane. It extends laterally nearly to angles of mouth where it blends with the 
proper upper lip. The frenum joins the mucous membrane of the double lip. He has 
had the condition since boyhood. His natural front teeth are present. He knows of no 
one of his own family with the same condition. To illustrate marked wrinkling of the 
skin of the lids and loss of elasticity Fig. 3 is shown. The two vertical ridges on 
the left upper lid have remained after pinching up the skin. The photograph is that of 
a female, age 57. The condition began to develop when she was 21. She also states 
that her mothers eyelids were similarly affected. 

From the literature it is obvious that skin folds of the lids have received a 
number of names by different writers, and it is not always possible to decide whether 
the conditions referred to by some under one name are the same as those described 
by others under other names. Scrini regards his case as the blepharochalasis of 
Fuchs, the cutaneous ptosis of Panas, the paupiere en besace of Frenkel, the angio- 
m^galie palpebrale of Rohmer or yet the dermatolysie palpebrale, but with subluxa¬ 
tion of the orbital lachrymal gland. Elschnig, according to Ascher, reserves the 
term blepharochalasis for those cases which besides skin atrophy and laxness show, 
at least with pressure on the eyeball, orbital contents (fat, lachrymal gland). He 
excludes, therefore, atrophy of the skin of the lid in the aged, lid skin atrophy with 
laxness and drooping, the “paupiere en besace ” according to Ginest-ous and Frenkel, 
ptosis atrophia according to Weinstein, and many cases of the old ptosis atonica. 
On the other hand he includes fat hernia of the upper lid (Schmidt-Rimpler) } 
lipomatosis of upper lid and ptosis adiposa, which Wagenmann includes as hyper¬ 
trophy of the fat tissue. Fuchs expressly states that he does not wish blepharochalasis 
to be included in the name ptosis atonica. 

As the skin changes in the present cases take several forms it was deemed 
unsatisfactory to attempt to include them under any less comprehensive heading 
than “changes in the skin of the lids.” The term epiblepharon or ptosis atonica 
would be suitable for some of the forms and possibly blepharochalasis in a few cases, 
though none of these cases agree exactly with Fuch’s description. 

Microscopical examination of the upper lid tissue in cases with skin folds has 
been made by a number of observers. Atrophic skin changes were found by some 



C. H. Usher 


5 


and infiltration by others. Elastic fibres have been found diminished by some 
investigators and normal by others. In blepharochalasis Miglietta found vascular 
and lymphatic changes in the skin of the lids. As none of the cases in this pedigree 
were examined microscopically, it is not intended to pursue the subject further. 
Reports have been published by Schmidt-Rimpler, Rohmer, Rosenstein, Lodato, 
Scrini, Stieren, Heckel, Wagenmann, Verhoeff and Friedenwald, Friedenwald, Eigel, 
Weidler, Miglietta and others. 

The age at onset of blepharochalasis has been given from infancy to 20 years. 
It is apparently an uncommon condition. Ascher states that only one case was seen 
among 30,000 out-patients. On the other hand Lafon and Villemont consider that 
it is not so rare as Scrini's article might make one think, and say that all ophthal¬ 
mologists have observed it, but that in general the patients are not much concerned 
about an infirmity so trifling and they come to consult for other reasons. 

The etiology of blepharochalasis has been ascribed to angioneurotic oedema of 
the upper lids, to fat from the orbit, and to a collection of fat in the upper lids. 
Rohmer believes that excess of orbital fat may have an influence at least on the 
degree of prominence of the skin of the lids, that dilated lymphatics are secondary, 
that there is an anomaly in development of the vascular system of the lids, and 
that the origin is undoubtedly vaso-motor, therefore his designation “angio-m6galie 
des paupieres sup^rieures.” Fuchs believed the cause was of neuropathic nature in 
some cases. Terson believes that the trophic disturbance in dermatolysie palp^- 
hrale probably depends on a neuro-trophic lesion dependent on the sympathetic. 
Scrini is inclined to agree with Terson that dermatolysie palp6brale depends on 
general causes and mentions digestive disorders, migraine accompanied by oedema of 
the upper lids, of which the integument by reason of congenital defect of tonicity 
becomes thinned following repeated distension and loses all its elasticity. Disordered 
menstruation is considered to be a causal factor by some. It has been suggested, says 
Benedict, that blepharochalasis may have some relation to the development of the 
thyroid. The onset at 10—17 years places it definitely within the period of func¬ 
tional endocrine development. Accardi also believes that as in the essential 
cutaneous atrophies of the dermatologists, it is necessary in blepharochalasis to 
ascribe a r61e to endocrine disturbance without being able to specify which gland is at 
fault. Miglietta thinks that blepharochalasis is brought about by hormonal influence 
associated, or not, with dysfunction of the neurovegetative system on the one hand 
and familial predisposition on the other*. Lafon and Villemont suggest that atrophy 
and looseness of the suspensory fibres of the skin of the lid may be hereditary. Ptosis 
adiposa, according to Weidler, presents some of the signs of blepharochalasis. There 
is relaxation of the skin, but no true atrophy follows the condition. The bagging of 
the upper lid is more marked at the inner side in ptosis adiposa and is thought to 
be due to relaxation of bands of fascia connecting the skin with the tendons of the 

* The author recaUs the case of a female with atrophic striae on the thighs and under the breasts 
though she had never had children and had experienced no noteworthy thinning of the body pointing 
to destruction of elastic fibres of the Bame nature and determined by the same cause which acted on the 
eyelids. 



6 Hereditary Entropion and Changes in the Skin of Eyelids 

levator and with the margin of the orbit. Mention has already been made of the 
conditions Elschnig includes and excludes under the name blepharochalasis. 

Heredity appears in only a small number of the published cases of blepharochalasis. 
In Graf's case (Fig. 1), which Franceschetti apparently accepts as one of blepharo¬ 
chalasis, only the lower lids were affected. The family was Russian. The four cases in 
three generations were similarly affected. 11,1 married and had no issue. II, 2 was not 
affected and died in his 71st year. In the four cases the symptoms arose in the fourth 
decade. The cases showed also relaxation of the skin on left side of neck. 



Fig. 1. Fig. 2. 


Schmidt-Rimpler’s case of fat hernia of the upper lid in a female had a mother 
affected with the same condition (Fig. 2). 

In Lafon and Villemont’s case (Fig. 3) blepharochalasis occurred in four genera¬ 
tions. The patient III, 1 was a man aged 43. His mother and maternal grandfather 
had a similar palpebral malformation. Besides he had five children of whom the 
second, a son aged 18, had the characteristic fold. The four other children were not 
affected. On the left side III, 1 had an ectopic lachrymal gland*. 


4 



Fig. 3. 


See note on p. 5. 


Fig. 4. 



0. H. Usher 


1 + 

r-L, 

1 ? 2 + 


Pig. 5. 

r 

’f 

r - H—i 


n ,$ 2 J 3 J 


'l 


r 


2 


V 

$ 


2 r 

Fig. 6. 

_l 


Franceschetti after referring to the rarity of hereditary blepharochalasis says that 
the skin fold of the upper lid that resembles blepharochalasis (epiblepharon) occurs 
as a familial form more frequently. In his Fig. 37* (Fig. 4 above) he shows a family 
tree. There is no description of the cases. !> 

In Miiller’s first pedigree (Fig. 5) II, 1, age 1J, has a fold of skin 
on both lower eyelids and entropion of the right lower lid. 1,1 has 
a slight degree of epiblepharon of both lower lids. II, 2, age six 
months, has marked epiblepharon of both lower lids. The father 
of I, 1 according to I, 1 had peculiar lower lids, but it is not 
certain that he had epiblepharon. 

In Miiller’s second pedigree (Fig. 6) a brother and sisters 
and three sons of the brother had epiblepharon of both lower 
lids. II, 3, age seven, according to the father, had at times 
the lashes of the lower lids touching the eyeball. On 
present examination it could only be made out that the 
lashes of the lower lids were pressed obliquely upwards by 
the skin fold. In both adults the appearance of the lids 
corresponded with Elschnig’s Fig. If. 

Migliettas first Pedigree (Fig. 7). II, 3, a 
woman age 51, had blepharochalasis. I, 2, her X 
mother, had similar folds in upper eyelids as 
patient. II, 1, a sister, and II, 2, a brother, had 
bilateral blepharochalasis from an early age, 
especially the brother whose affection had 
increased with age. The patient as a girl had 
enlarged cervical glands. She had chilblains 
on hands and feet. Menstruation irregular. 

Married and had three children. One died in 
infancy. Ill, 3, a daughter, has signs of 
blepharochalasis on left side. 

The alteration of the eyelids dates from a 
young age. Patient has never noticed any swelling or reddening of the skin of the 
eyelids. The folds of skin increased more and more. The skin fold is pale, thin and 
inelastic, no net-work of veins is seen. 

Miglietta*8 second Pedigree (Fig. 8). 

II, 2, a widow, age 62, haB blepharo- ^ 
chalasis. Her father 1,1, a small farmer, 
with blepharochalasis, died aged 68 of 
lung trouble. Patient has two brothers 
and four sisters. Almost all of these have j 
an evident degree of blepharochalasis. 

Patient is the second bom. She married 

* “Abb. 87. Vererbung von Blepharochalasis bzw. Epiblepharon dee Oberlides (Eigene Beobachtung 
f Med. Klinik 1922, Jahrg. 18, nr. 16, S. 496. 


m 


r“ 

iS 

Fig. 7. 


if *? 




I 

.s 


1 


T- 1 


I I IT I * * 

1$ 2$ s i s ® 

Fig. 8. 



8 Hereditary Entropion and Changes in the Skin of Eyelids 

and had an abortion. In the chart the only one in the sibship that is not shaded at 
all is II, 7. 

Congenital Entropion . For readers unfamiliar with the subject it maybe stated 
that entropion is a rolling inwards of the lid constituting trichiasis so that the 
lashes rub against the cornea. There is a cicatricial variety due to contraction of 
the conjunctiva as a result of burns, accidents, trachoma and other causes and a 
spasmodic or muscular variety caused by contraction of the orbicularis muscle. 
Himly states that late in foetal life the lid margins are markedly turned inwards, 
and that if this condition persists beyond the normal a slight trace of congenital 
entropion may result, but true entropion does not occur congenitally. In 1841 
v. Ammon saw entropion of the lids of the left eye and of the upper lid of the right eye 
in a girl of three, also ectropion of the right lower lid. Primary congenital entropion is 
a rare occurrence and, according to Aubineau, dc Wecker in the course of his long 
career had never met the deformity. Berry regards this variety of entropion as 
probably due to abnormal development of the orbicularis in the vicinity of the lid 
margin. Secondary entropion is associated with anomalies of the eyeball as micro¬ 
phthalmos and some conditions of the lids as epicanthus. There appear to be two 
forms of primary congenital entropion, (1) a spasmodic variety, and (2) a variety in 
which the tarsus is absent. Guibert (1892), in a case of double congenital entropion 
which the mother had noticed eight days after the birth of the child, found on operat¬ 
ing absence of the tarsal cartilage. Harlan (1895) examined two cases of congenital 
entropion of both upper lids with deficiency of tarsal cartilages. Leblond (1907) 
mentions two cases of bilateral entropion in sisters. In the younger sister the inturri- 
ing of the lids was noticed by the parents at birth. There was no epicanthus, skin 
was not hypertrophied. At operation the tarsus was made out to be well developed 
and there was excessive development of the orbicularis in the neighbourhood of the 
ciliary region. Polstoocchow (1914) saw two sisters with congenital entropion, 
which the mother had noticed immediately after birth. The mother also affirmed 
that her last child, aged some months, showed the same defect of development. 
Polstoocchow ascribed the pathogenesis to development of the ciliary bundles 
of the orbicularis muscle. In Sziklai’s case (1917), a boy of four, the congenital 
entropion was due to an exaggerated development of the palpebral portion of the 
orbicularis muscle. In Hessberg’s two cases (1922) the tarsus was normal. Yan 
Chow (1925) describes three cases of entropion of the new born operated on with 
success. The tarsus was in all cases normal. In his first case the upper lid was 
turned in. Schorr (1926) operated for spasmodic entropion of upper lid in a suckling. 
She gives the literature of spasmodic entropion of the upper lid. Aubineau (1928) 
records two cases of congenital entropion with spasmodic appearance and no defi¬ 
ciency of tarsus. He also records three cases of congenital entropion (in mother, son 
and daughter) from deficiency or absence of tarsus. Denig (1899) found in a child 
the inner part of the upper lid inverted. In two cases aged seven years and ten 
months respectively Dimmer (1885) attributes the entropion to excess of skin of 
the lower lid. Gomez Marquez (1930) reports congenital entropion of the lower lids 
in two cases. In one it was bilateral, in the other unilateral. 



C. H. Usher 


9 


Hereditary congenital entropion is evidently very rare. The cases described by 
Leblond, Hessberg, Polstoocchow, and Aubineau are the only instances known to me, 
unless there is included Sydney Stephenson's cases (1894) of two brothers (Fig. 9), 
aged seven and six respectively, with congenital trichiasis, in which the intermarginal 
space was almost normally directed. Inheritance of congenital entropion has been 



is 

ORINF. 

Pig. 9. 

Fi 

Fig. 10. 


recorded by Leblond (1907) (Fig. 10). Of two sisters affected, the younger, age eight, 
had bilateral entropion of her lower lids. The inturning of the lids was noticed by 
the parents at birth. The elder sister, age 18, was operated on for an identical mal¬ 
formation at the age of four. Father and mother normal. 

Polstoocchow(1914) saw (Fig. 11) two sisters with congenital entropion; in one of 
them it was bilateral, in the other unilateral. According to the mother, the condition 
was present from birth. She said that her youngest child, age some months, showed 
the same defect. 



Fig- 11- Fig. 12. Fig. IS. 


Hessberg (1922) recorded congenital familial entropion of both lower lids in a 
sister, age three, and brother, age two (Fig. 12). 

Aubineau (1928) saw a mother, daughter and son with congenital entropion 
(Fig. 13). In the mother it was localised in the left lower lid. Both lower lids were 
affected in daughter and son. The mother was operated on. There was deficiency 
or absence of the tarsus. 

Attention was drawn to the pedigree (see folding sheet) that is the subject of 
this paper by the occurrence of spasmodic entropion in two men at an unusually 
early age. Each gave a history of having relatives affected with the same condition. 
These two men proved to be first cousins. Investigation showed the correctness of 
their story and that in some cases the entropion was associated with unusual 
conditions of the skin of the lids, which also appeared in some cases apart from 
entropion. They were both in-patients, in 1896, at the Aberdeen Royal Infirmary. 



10 Hereditary Entropion and Changes in the Skin of Eyelids 


David R. (Ill, 53, Plate I, Fig. 4), age 19 (Nov. 28,1896), fisherman. Diagnosis: 
Congenital entropion of right lower lid. Right lashes turned in along whole length 
of lower lid and rubbing against the eyeball, cornea clear, refraction myopic, pupils, 
equal, contract to light, considerable photophobia; left lower lid tends to turn in 
when he is lying down, refraction of left eye hypermetropic. His paternal grand¬ 
father (I, 3) had both lower lids turned in. This is said to have been since birth. 
A brother (Ill, 51) had both lower lids turned in since birth. They were remedied 
by operation. A male (III, 39), first cousin, son of an uncle, had both lower lids 
turned in from birth and another first cousin (III, 41) had one lid turned in from 
birth. Patient has had excellent health. He comes because of defective vision, 
“sight goes away when at sea.” He can usually see to work quite well, but in bright 
sunlight or when snow is on the ground he “can’t hold his head up” owing to pain 
in his eyes. Heart and lungs normal; urine, acid 1022, no albumen. Oval piece of 
skin and some fibres of orbicularis muscle removed from right lower lid. This proved 
to be insufficient so that some weeks later another piece of skin was removed with 
a satisfactory result as five days later the lid edge was remaining well out. When 
visited 34 years later, in January, 1931, there was marked rolling in of right lower 
lid, Fig. 4, though the eyeball was quite quiet. No entropion of left lower lid. Has 
been working regularly on a steam line fishing boat and makes no complaint; eye¬ 
lashes on all of the lids look normal, right palpebral fissure is not enlarged, plica 
semilunaris and caruncle are normal. Right lower lid is not thinned; R.V. = < 


— 12 D sph 

with IT 2.5 0 D cyHmier ax * s horizontal**-^; D.V. £ fully with +0*50 D. sph. 

Ophthalmoscopical examination: in right eye is a posterior staphyloma with complete 
choroidal atrophy, left fundus normal. Eyeballs of full size; no epicanthus, large 
fold of skin on both upper lids greater at outer part, no nodules felt in it, skin on 
forehead furrowed probably from attempts to keep up the lid skin fold, margin of 
upper lid is in good position, but especially on right side it becomes covered to a 
large extent by the skin fold, which requires to be raised by a finger for examination 
of fundus, considerable wrinkling of skin of lids. On right side the lashes on lower 
lid are completely hidden. Is married without issue. 

The notes of the next case, first cousin of the above, are also from the hospital 
records. Wm. R. (Ill,41),age 27 (June 30,1896),fisherman. Diagnosis: R. entropion 
and corneal ulcer. In right eye at lower part of cornea is an ulcer, also old corneal 
opacity. About seven years ago patient noticed that lashes of his right lower lid were 
rubbing against the eyeball, causing the eye to become red and inflamed with much 
watering at times. Small ulcers appeared occasional ly at the lower part of right eye 
where lashes irritated it. These usually soon disappeared. Seven weeks ago he 
noticed a fresh ulcer in right eye which as usual was red and watering and this 
time the ulcer did not go away as formerly. Three years ago there was an ulcer at 
the margin of his left cornea. The opacity spread over nearly the whole of the 
cornea but under treatment “has quite disappeared.” His paternal grandfather 
(1,3) had lower lids turned in (patient thinks he was bom with it). Father is believed 
to have had affection of lids—one male first cousin (III, 51) had both lids and another 





a H. USHEB 


11 


(III, 58) one lid turned in from birth. Patient as an infant had bronchitis and inflam¬ 
mation of the luiigs. He has been quite healthy ever sinco, but never very strong. 
He looks healthy and well nourished. Heart and lungs normal; urine, acid 1022, no 
albumen, no deposit. An oval piece of skin was removed from right lower lid. The 
corneal ulcer soon healed and the lid margin kept in proper position. 14. 1. 31. He 
died unmarried. 

Alexander R. (Ill, 51) born in 1871, fisherman. From an entry in the ophthalmic 
record book at the Aberdeen Royal Infirmary it was found that he was admitted as 
an in-patient on July 30, 1877, for '‘trichiasis/’ When seen on Jan. 20, 1931, he 
remembered being sick after the chloroform and having to return to hospital for 
removal of stitches from his lids. His lids were in good position, a scar could be 
detected on right lower lid and a smaller one was present on left lid; puncta, 
caruncle and plica semilunaris are normal; consistence of lower lids feels normal on 
palpation; no epiblepharon. 

Of the next six cases we have the assurance of Dr MacHardy, who has worked 
among these people since 1892 and knows them intimately, that they (i—vi), now 
deceased, had entropion; additional reports regarding them given by the relatives 
are as follows: 

(i) George R. (Ill, 39) had entropion of lower lids. 

(ii) George A. (Ill, 57) married twice, no issue from either marriage, marked 
entropion. 

(iii) George R. (II, 13), according to his niece Mi’s W. (Ill, 47), had both lower 
lids turned in. In the notes of his son, Wm. R. (Ill, 41), the lids of his father were 
reported to be unaffected, but Dr MacHardy is certain that they were affected. It is 
possible that though he had entropion he experienced little inconvenience as was 
the case with David R. (Ill, 53), who had worked steadily for many years without 
discomfort with entropion of his right lower lid. Another possibility is that the 
entropion occurred periodically and was not observed by the son. No members of 
his sibship are alive. 

(iv) John R. (II, 18), fisherman, according to his daughter, Mrs F. (Ill, 66), 
her husband, and mother-in-law (not shown in chart) had blinking eyes, but none 
of them could say there had been entropion. Vision became defective in advanced 
life. 

(v) Mary A. (II, 16), according to her daughter, Mrs H. (Ill, 60), had both 
lower lids rolled in. She married her first cousin II, 19. 

(vi) George R. (I, 3) married Mary A. (1, 4). Both lower lids were rolled in. 
His grandson's, Wm. R. (Ill, 41), notes state that his paternal grandfather's lower 
lids turned in and it was thought that he was born with the condition. A similiar 
note occurs in the records of David R. (Ill, 53, Plate I, Fig. 4), another grandchild. 

Alexander R. (II, 15) died of tumour of the bowel at age of 54. His eyes were 
"the same as his brother John's’ (II, 18). Dr MacHardy had seen him, but could 
not recollect in what way his eyes were affected. They certainly gave him trouble. 



12 Hereditary Entropion and Changes in the Skin of Eyelids 

Mrs Robbie H. (1,1). Her son Robbie H. (II, 1) said that his mother’s eyes 
were always watering but he could not say whether the lids rolled in. 

Jessie F. (Ill, 37), age 72 (1931). Right lower lid turned in chiefly at outer 
part, eye is blind, whole of cornea is opaque, eyeball is not shrunken, iris and pupil 
are visible. Ocular conditions attributed by her to an injury of right eye by a stone 
at age of five. No scars found suggesting an injury and her doctor suggests that she 
does not like to admit that the appearance of her eyes is an hereditary blemish. The 
opacity of the cornea may be explained by the lashes rubbing on the eyeball and 
causing inflammation. Eyelashes are normal; large skin fold on both upper lids, 
its lower edge passes from above inner canthus obliquely down and out and covers 
lower margin of upper lid for a considerable distance. Right palpebral fissure 
narrower than left; plica semilunaris and caruncle are normal, lower lids are not 
thinned; the entropion of right lower lid is not constantly present, but is obvious 
when she laughs; no epicanthus. Plate III, Fig. 5. 

Isy R. (Ill, 42), age 61, sister of the above. Well-marked skin folds on upper 
lids, some loss of elasticity. Lids swell from time to time, swelling lasts for half a 
day, left lower lid has a fullness at its margin like the appearance seen in lids with 
spasmodic entropion. The fullness is not present on right lower lid, eyeballs normal 
externally, good vision. Her husband said that One of her lower lids sometimes 
rolled in and it was the left one. Previous information received from others was to 
the effect that her lower lids rolled in so that the lashes rubbed on her eyes. 

George C. (IV, 50), age 54, fisherman, said that his eyes began to trouble him 
at the age of five. All of the lids became swollen, but the lower ones were worse than 
the upper. The swellings were intermittent. The eyes began to improve at age of 
16; sometimes edges of lower lids rolled in. He spoke of the “family eyes” and on 
being questioned whether by that he meant a rolling in of the lower lid or a large 
fold of skin on upper lid he replied, “ Both.” He has now a small fold on upper lids. 
Skin of both upper and lower lids shows some wrinkling and loss of elasticity, 
colour of upper lid is reddish blue. Near lid margin, especially on left side, skin 
feels rather thick; no nodules felt; brow furrowed; eyes not raised. Plate III, Fig. 6. 

Besides those cases with unusual appearances of the skin of the lids, that are 
associated with entropion, and which sometimes suggest blepharochalasis, there are a 
number of similar cases without entropion. 

Mrs H. (III. 60), age 56, has had intermittent swellings of her lids. She cannot 
say definitely wheu they began, but they increased in size during menstruation. 
The skin of all the lids is much wrinkled, thin and deficient in elasticity, colour 
normal. No skin fold on upper lids. Thyroid not enlarged. No double lip. Has had 
good health, no digestive or nerve troubles, no migraine. Plate III, Fig. 7. 

John H. (IV, 96), age 27, has a marked fold of skin on each upper lid. They 
overlap the upper lid margins at the outer and middle thirds* There is not much 
wrinkling of the skin which has its usual colour and consistence, no veins seen, no 
nodules felt; brow is furrowed, eyebrows lie low and straight, not arched. He is 
myopic, large crescent at lower part of margin of each optic disc. Plate III, Fig. 8. 



C. H. Usher 


13 


Miss H. (IV, 97), age 25, marked fold of skin on each upper lid, especially the 
left one, which covers the outer two-thirds of the upper lid margin, skin is not 
wrinkled, no veins visible, colour not unusual, no nodules felt, consistence normal, 
no loss of elasticity. Lower lids show some furrowing. Formerly her lower lids, 
swelled, lashes normal, refraction myopic; general health good, no migraine, no 
digestive or nerve disturbance. Thyroid not enlarged, no double lip. Plate IV, Fig. 9. 

Mrs Elsie H. (III. 45), youngest in sibship of nine, age not recorded, but her sister 

III, 42, fourth youngest, is 61; marked skin folds on upper lids. The skin is thin 
and crinkled on both lower and upper lids; horizontal furrows on brow. Plate IV, 
Fig. 10. 

Mr H. (IV, 63), one of her sons, has marked skin folds on upper lids. 

James R. (V, 64), age 14. From eyebrows downwards nearly to lid margins is a 
swelling, skin at this part is reddish, and has lost some of its elasticity, no veins 
visible, no crinkling of the skin, nothing abnormal as regards its consistence, no 
nodules felt in the swellings, eyebrows not raised, no entropion, some furrows on 
lower lids, there is blepharitis (the only case in the pedigree in which this was 
observed). Eyes affected since he was three years old. Swelling has not been inter¬ 
mittent. Health good, no digestive or nerve affection. Plate IV, Fig. 11. 

George R. (V, 65), age 13. Eyes became affected when about two years of age. 
There was intermittent swelling of both lower and upper lids. Has had good health. 
Skin fold of upper lid is greater on left side than on right. The condition is said to 
be less marked now than formerly. Skin of upper and lower lids shows numerous 
horizontal lines, colour and consistence of skin normal, no loss of elasticity. Neither 
he nor his brother James has double lip or enlarged thyroid. 

Master M. (V, 71), age 7. Well-marked swelling like folds on both upper lids. 
Plate IV, Fig. 12. 

William P. (IV, 8), age 17. First born of five. Marked swelling below eyebrow 
on each side. Plate II, Fig. 13. It obscures outer two-thirds of upper lid margin. The 
illustrations of Shoemaker’s case of bilateral enlargement of the lachrymal glands 
and of Wagenmann’s case of ptosis adiposa resemble the appearances presented by 

IV, 8 and V, 64. George P. (Ill, 5), father of IV, 8, has much wrinkling of skin of 
lids and a fold of low degree on upper lids. Has five children. First-born age 17. 

Maggie S. (IV, 45) has obliquely lying skin folds on upper lids. Her children, 
Mary S. (V, 33) and Jacob S. (V, 34), have marked fullness under the eyebrows. 
Ages about 18 and 17 years respectively. 

Jessie M. (Mrs S. IV, 46), age 43, on upper lids lax, redundant skin with very 
marked loss of elasticity. 

Georgina Mac. (IV, 49), age 31, bulging of tissue below eyebrow. No swelling 
of lower lids. 

William Mac. (V, 60), child of IV, 49. Right side, swelling below eyebrow is 
greater than that on left side. It presses on the lashes when his head is held erect 
and he looks at an object on the same level as his eyes. 



14 Hereditary Entropion and Changes in the Skin of Eyelids 

Alexander Mac. (V, 61), child of IV, 49, fold of skin on right upper lid which 
touches lid edge at its middle portion. 

Annie H. (I, 1) was reported by her son II, 1 to have always had watering eyes. 
Whether the lids rolled in could not be said. There was an indefinite history that 
I, 2 had been affected in the same way. 

When examining the upper eyelids of the relatives it became evident that a 
standard was required to serve as a guide to what should be regarded as a normal 
and what an abnormal appearance of the upper eyelid. It was decided to classify 
as normal those eyelids whose' margins were visible and not obscured at any part by 
swelling or bulging of the skin or by a skin fold. To form some estimate of the 
proportion of eyelids with such abnormal appearances in the general population 
502 people were examined. These were taken almost exclusively from the general 
wards of the Aberdeen Royal Infirmary and the Royal Aberdeen Hospital for Sick 
Children. In such an examination there are several pitfalls. Should the examinee 
look even slightly upwards the margin of the lid, previously visible, may in some 
cases become obscured. Similarly, when the examinee is fixing an object when the 
visual axes are parallel with the ground a very slight depression of his head may 
cause some of the lid margin to be hidden. Very important is the position of the 
head, which must be held erect during the examination. Another important matter 
when examining doubtful cases is the position of the examiner’s eye, which must 
be on the same level as the examinee’s eyes and the object he is fixing. In some 
cases in which part of the lid margin becomes temporarily obscured frowning is the 
cause. Some children exposed to bright sunshine had the margin of the lid hidden, 
but when brought into the shade the margin became visible. By raising their .eye¬ 
brows some individuals can readily expose the upper lid margin previously covered 
by a fold of skin. 

The 502 control cases of all ages included 20 persons, 12 males and 8 females, 
with some obscuration of the margin of the upper lid on one or both sides, namely 
3*98 per cent. In dividing the cases into two age groups there are 254 people aged 
20 years and upwards and 248 people aged from infancy to 19 years. In the former 
group are 10 abnormal cases, that is the upper lids had an abnormal appearance, 
namely 3*93 per cent., in the latter group are 10 abnormal cases, namely 4*43 per 
cent. Of the 1004 upper eyelids examined 34 were abnormal, i.e. 3*38 per cent.; 6 
of the cases were unilateral. In the group of older people were 16 abnormal upper 
lids, 3T4 per cent., and in the group of younger people were 18 abnormal upper lids, 
3*62 per cent. The ages of those with part of the margin of the upper lid hidden 
by the skin of the lid at time of examination were: In first decennium, 7, of whom 
4 males, 3 females; second decennium, 3, all males; third decennium, 2 females; 
fourth decennium, 3, of whom 1 male, 2 females; sixth decennium, 2, of whom 
1 male, 1 female; seventh decennium, 1 male; eighth decennium, 2 males. 

The pedigree chart shows 13 individuals with entropion, 3 females and 10 males. 
These are distributed in four consecutive generations and invariably the condition 
passes directly from parent to child, usually from father to son or daughter. In one 



C. H. Usher 


15 


instance only has an affected female had an affected child; II, 16, a female, had a 
son, III, 57, with entropion. Generally the entropion was noticed in early life and 
in some cases it was believed to have been present from birth. Owing to the in¬ 
significant disturbance caused by entropion in early life it is probable that in a 
number of cases the condition had been present for a longer period than the history 
indicates. Aubineau found symmetrical entropion in an infant of 10 months, that 
manifested no pain, and appeared little disturbed, and the rubbing of the lashes 
produced only a slight injection of the conjunctiva below the cornea. Again, in 
adult life entropion may be present for many years without causing much discomfort 
or interfering seriously with work as was seen in the case of D. R., Ill, 53, a fisher¬ 
man. The entropion in some of the cases was not constantly present. In all cases 
it occurred in the lower lid. In some it was bilateral, in others unilateral. 

Since an abundance of extensible skin is necessary for the development of 
spasmodic entropion found commonly in elderly people, it is presumably the same 
in cases occurring in early life, such as the examples seen in this pedigree. A number 
of cases occur in the pedigree with loose skin on the lid. Several cases had relaxed, 
wrinkled skin on the lower lid after intermittent swelling. If the skin of the lower 
lid becomes relaxed two factors would tend to cause entropion, namely strong con¬ 
traction of the palpebral portion of the orbicularis muscle and an insufficient supply 
of orbital fat. The first factor would operate, especially in the men when at work 
as fishermen, from exposure to bad weather, spray, sun, and reflection from the sea. 
This factor may be of importance in this pedigree. Insufficient orbital fat by causing 
the eyeballs to lie far back in the orbit diminishes the support for the lids. In my 
experience these fisher folk are not generally of robust appearance and many of 
them are small. I am informed that sometimes they obtain insufficient nourishment, 
also that they are injudicious eaters, consuming many cakes and sweatmeats when 
they can afford to buy them. Whether these habits are of any importance in the 
causation of this condition I cannot say. No evidence has been obtained that bears 
upon the matter. 

In the pedigree occur several unusual appearances of the skin of the eyelids. 
These changes are found in those without entropion as well as in those with entro¬ 
pion. Three forms are distinguishable. The first form occurs in the upper lids and 
is bilateral. There is obvious swelling extending from the supra-orbital margin 
downwards so as to obscure the tarsal portion of the eyelid and part of the lid 
margin, often the middle and outer thirds. The swelling appears as a fullness or 
bulging and not as a flap. On palpation no nodules are felt in the swelling. The 
skin has a normal appearance or else is redder than usual, few if any veins are 
distinguishable. Two examples of this first form, which may be the earliest stage 
of blepharochalasis or epiblepharon, are seen in Plate IV, Fig. 11 and Plate II, Fig. 13. 
The recognition of the first form, unless it is* well marked, may be difficult, or im¬ 
possible, owing to its close resemblance to apparently normal conditions found in a 
proportion of the general population. Amongst 502 individuals whose ages ranged 
from infancy to 81 years, in several a similar bulging of the skin extended so low 



16 Hereditary Entropion and Changes in the Skin of Eyelids 

as to touch the lashes and hide part of the lid margin, when the individual, with head 
held erect, looked straight forward at an object on the same level as the eyes. But 
no such difficulty arises in such marked examples as V, 64 (Fig. 11) and IV, 8 
(Fig. 13). The second form is a fold of thin skin with or without wrinkling, that has 
lost elasticity, and is situated at the usual site of the normal skin fold of the upper 
lid. Its lower margin passes obliquely from above and inwards in a direction down¬ 
wards and outwards so as to cover the edge of the lid at its outer and sometimes 
middle third. See Plate IV, Fig. 10. This form might be termed ptosis atonica or 
epiblepharon and in some cases possibly blepharochalasis. In the sister and brother, 
IV, 97 and IV, 96, with marked folds on their upper lids, it could not be determined 
whether they had been preceded by swelling of the lids. Their mother, III, 60, who 
had noticed nothing peculiar about her childrens eyelids, had noticed no swelling. 
The third form occurs in both upper and lower lids and resembles the wrinkled 
skin of old age. The skin is thin, shows loss of elasticity and is wrinkled. See 
Plate III, Fig. 7, a woman who had repeated attacks of swelling of both upper and 
lower lids. As these three clinical forms occur amongst those who are all con¬ 
sanguineous relatives it is just possible that they are different stages of the same 
affection or at any rate that one form may develop into one of the other two 
forms. There is no proof, however, that they do so, and to decide the matter definitely, 
it would be necessary to observe individuals with the first form for prolonged 
periods. That the first form becomes the second or third forms is suggested by 
the earlier age of its occurrence. The average age, when seen, of six of the seven 
cases in the first group in which the age is recorded is 14*33 years. The seventh 
case was 31. The average age of cases in the second group was 47*80 years 
and in the third group 49*00 years. These are the ages when the cases were seen 
and not when the appearances were first manifested. The wrinkled skin (third 
form) which occurs in the lower lid obviously cannot result from the swelling of 
the upper lid (first form). The wrinkling of both upper and lower lids probably 
resulted from intermittent swellings of these lids. (Cases III, 42, IV, 50 and V, 65.) 

Benedict says that blepharochalasis has an intumescent stage or stage of oedema. 
After several attacks of swelling one of two things may happen. (1) Swelling be¬ 
comes constant with bagginess of the skin of the lid so that loose folds hang down 
over the margin, giving the appearance of water-filled bags, with the skin altered 
slightly in colour, very thin, and slightly folded or wrinkled; or (2) the swelling 
disappears entirely and the skin becomes reddish brown and wrinkled and is thrown 
into horizontal folds. The stage of wrinkling is the end stage of the disease. 

Our chart shows 21 cases with abnormal conditions of the skin of the lid; 12 
are males and 9 females; 4 have entropion and 17 no entropion. These figures almost 
certainly underestimate the number of wises present in the pedigree, since doubtless 
a number of individuals in generations I and II had similar changes in the skin of 
the lids, and in generations III to V cases may be present amongst those who were 
not examined, and also in some members in generations IV and V the condition 
may not yet have appeared. 



C. H. Usher 


17 


The question arises, what is the significance, if any, of the presence in the same 
pedigree of both entropion and these changes in the skin of the lids ? A glance at the 
small skeleton chart below (Fig. 14) shows at once the relationship of one to the other. 
Besides the occurrence of each of these conditions separately there are at any rate four 
cases in which they occur together in the same individual. If it be accepted that all 
cases of spasmodic entropion require for their occurrence relaxation and redundance 
of skin of the lid then all of the entropion cases in this pedigree must be classed with 
the cases grouped under “changes in the skin of the lids/ > which would now number 
30. Twenty-four of these, in four generations, show direct inheritance from an affected 



♦ 5WN 

• WOT 88BN 


Fig. 14. 

parent, and of the remaining five cases it cannot be said that the changes in the skin 
of the lid were not directly inherited from their mothers, II, 2, III, 38 and IV, 60, 
for they were not examined. It seems reasonable to suppose from the close association 
of the changes in the skin of the eyelids and entropion that a factor is inherited 
which in some way unknown is responsible for the presence of the former and that 
the entropion is only a secondary manifestation. It may be thought that there are 
no good grounds for the supposition for most of the changes recorded in the skin 
were in the upper lids while the entropion was invariably in the lower lids. In a 
proportion of the cases, however, definite changes were seen in the skin of the 

Biometrika xxiv 2 



18 Hereditary Entropion and Changes in the Skin of Eyelids 

lower lids and also quite possibly changes sufficient to permit the occurrence of 
entropion may take place in the skin of the lower lid without showing marked 
clinical manifestations. I am informed that fisher families in the same village that 
do not belong to this pedigree are not subjects of hereditary entropion. 

Summary. 

Reference is made to cases in the literature which have a bearing on various forms 
of change in the skin of the lids and on congenital entropion. Two cases not belonging 
to the pedigree are reported of epiblepharon or blepharochalasis with double lip. 

A pedigree with chart and photographs is shown which contains cases of spas¬ 
modic entropion appearing at unusually early ages and cases with changes in the 
skin of the lids. Three varieties of the latter are recognised. First a swelling in the 
form of fullness or bulging of the skin of the upper lid, second a skin fold of the 
upper lid, and third a crinkling of the skin of both upper and lower lids. The 
entropion is invariably transmitted from parent to child, usually from father to 
son or daughter, and similarly the above described changes in the skin of the lids, 
which occur in cases with or without entropion, show continuous descent*. 

Attention is drawn to the importance when examining folds of skin, or swellings, 
of the upper lid in relation to the lid margin of recognising the alterations in 
relative position that occur when the head is not erect, when the position of the 
eyes are altered, when frowning occurs or when the eyebrows are raised and when 
the examiners position is altered. 

Owing to the close association in the same pedigree and sometimes in the same 
cases of changes in the skin of the lids and entropion it is suggested that there is 
an hereditary factor that in some unknown way is responsible for the changed condi¬ 
tion of the skin of the lids and that the occurrence of entropion is only secondary. 

My thanks are due to Dr T. MacHardy for much help in tracing out the pedigree 
and for information regarding deceased affected members, to Dr J. W. Macrae for 
showing me so many descendants of III, 38,1,1 and I, 2, and to Dr William Brown 
for the photographs shown in Plates I—IV, Figs. 2 to 12. 

REFERENCES. 

Aocakdi. Blepharochalasis. Bollettino di Oculistica , No. 6 . Firenze, 1925. Ref. Annates dOcuUstique^ 
T. clxv. p. 224. Paris, 1928. 

v. Ammon, F. A. Klinische Darstellungen der angeborenen Krankheiten des Auges und der 
Augenlider. 6, Tab. n. Fig. xv. Berlin, 1841. 

Arlt, C. F. Graefe-Saemisch Handbuch der gee. Augenheilkunde , Bd. in. S. 454. Leipzig, 1874. 
Ascher, Karl W. Blepharochalasis mit Struma und Doppellippe. Klinische Monatsbldtter f. 
AugmheUkunde , Bd. lxv. ii. S. 86. Stuttgart, 1920. 

-. Das Syndrom Blepharochalasis, Struma und Doppellippe. Klinische Wochenschrift, Bd. I. 

8. 2287 (Nov. 11). Berlin, 1922. 

Aitbineau, E. Lea “Entropions congdnitaux. ,> Annates dOculistique , T. clxv. pp. 161—169. 
Paris, 1928. 

* [An exception to this rule appears in V, 71, who inherits from his grandmother 111, 45, through 
a normal mother, IV, 60. Ed.] 



G. H. Ushbe 


19 


Rac%L. Ueber aymmetrische Lipomatosie der Oberlider (Blepharochalasis ?). Archiv f. Augen - 
heUkwnds, Bd. lit. S. 73. Wiesbaden, 1906. 

Brasher, William L. Blepharochalasis. Report of three cases. (Sect of Ophthalmology, Mayo 
Clinic). Journal of American Medical Association, Nov. 1926, p. 1735. Chicago, 1926. 

Bebry,George A. Diseases of the Eye , 2nd edition, p. 56. London, 1893. 

Denig, R. Beitrag zur Aetiologie der angeb. Trichiasis. 71. Versamml. der deutsehen Naturf und 
Aertte in Munohen. Abteil f Augenheilkunde, 1899. Ref. Nagel's Jahresbericht der 
Ophthalmologic, Bd. xxx. S. 335. Tubingen, 1899. 

Dimmer,^. Epioanthus und Entropium. Klimsche Monatsbldtter f. A ugenheilkunde, Bd. xxm. 
S. $08. Stuttgart, 1885. 

Eigel, Walter. Blepharochalasis und Doppellippe ein thyreotoxisches Oedem? Deutsche 
medmnische Wochenschrift, Bd. Li. S. 1947 (Nov. 20). Leipzig, 1925. 

Fxhb. Ein Fall von Lidhaut-Erschlaffung, sog. Blepharochalasis. Centralblatt f praktische 
AugenheUkunde, Bd. xxn. S. 74. Leipzig, 1898. 

Franceschetti, A. Kurzes Handbuch der Ophthalmologic (herausgegoben von F. Schieck und 
A. Bruckner), Bd. I. Berlin, 1930. 

Friedenwald, J. S. Further note on Blepharochalasis. Archives of Ophthalmology , Vol. lii. 
p. 367. New York, 1923. 

Fuchs, & Ueber Blepharochalasis (Erschlaffung der Lidhaut). Wiener klinische Wochenschrift , 
Bd. JX. S. 109. Wien, 1896. 

Ginestous. Un Cas de Blepharochalasis. Gaz. hebd. des Scienc. mid . de Bordeaux, 27 juillet, 1913. 
Ref. Nagel's Jahresbericht der Ophthalmologie, Bd. xliv. S. 359. Tubingen, 1915. 

Gomez, Marquez. Congenital Entropion. Archivos de Oftalmologia Hispano-Americanos, 
Vol. xxx. p. 187. Madrid, 1930. Ref. Archives of Ophthalmology , Vol. iv. p. 648. New 
York, 1931. 

Graf. Oertliche erbliche Erschlaffung dor Haut. Wochenschrift /. d. gee. Heilhunde , No. 15 
(April 9). Berlin, 1836. 

Guibert. Un Cas d'Entropion congenital double. Guerison. Archives d'Ophtalmologic, T. xu. 
p. 101. Paris, 1892. 

Harlan, George C. Transactions, American Ophthalmological Society, Vol. Vli. p. 418. 31st 
Annual Meeting, New London, 1895. 

Heckel, Edward B. Blepharochalasis with Ptosis; Report of a Case. American Journal of 
Ophthalmology, Vol. iv. p. 273. Chicago, 1921. 

Hessberg, Richard. Ueber ein angeborencs familiares Entropion beider Untcrlider. Klinische 
Monatsbldtter f. Augenheillcunde, Bd. lxvih. i. S. 120—124. Stuttgart, 1922. 

Himly, K. Die Krankheiten und Missbildungen des menschlichen Auges. Bd. I. S. 121—122. 
Berlin, 1843. 

Jenison, Nancy. Blepharochalasis. New York Medical Journal, Vol. cn. pp. 555—556. New 
York, 1915. 

Lafon, Ch. et Villemont. Blepharochalasis her6ditaire avec Dacryadenoptose. Archives 
cPOphtalmologie, T. xxvi. p. 639. Paris, 1906. 

Leblond, £ttienne. £tiologie de Pentropion congenital. Archives cPOphtalmologie, T. xxvil. 
pp. 782—787. 1907. 

Lodato, G. Blepharochalasis. Archivio di Ottalmologia, Yol. xi. Napoli. Ref. Archives dOplUal- 
mologie, T. xxiv. p. 770. Paris, 1904. 

Loeser, L. Ueber Blepharochalasis und ihre Beziehung zu verwandteu Krankheitsbildern nebst 
Mittheilung ernes Falles von Blepharochalasis mit Spontan-Luxation der Thranendriise. 
Archiv f. Augenheillcunde, Bd. lxi. S. 252. Wiesbaden, 1908. 

Mackenzie, W. Diseases of the Eye . 4th edition, p. 187. London, 1854. 

Meguro, T. Ein Fall von Fetthemien der beiden oberen Augenlider. Acta Soc . Op . Jap. Yol. 
XXXIV. p. 13, 1930. Ref. Klinische Monatsbldtter f Augenheilkunde, Bd. lxxxvl i. S. 843. 
Stuttgart, 1931. 


2—2 



20 Hereditary Entropion and Changes in the Skin of Eyelids 

Miglihtta, F. Su 1a Blefarocalasi (Contribute clinico ed anatomo-patologioo). (Clin. Oculist. 
Univ. Firenze.) Bollettino di Oculistica ) Yol. ix. pp. 1190—1214. Firenze, 1930. 

MOllkr, H. K. Kongenitales Entropium durch Epiblepharon, Klinische Monatsbldtter f. 4-ugen- 
hmlkunde^ Bd. lxxxvii. 8. 184. Stuttgart, 1931. 

Polstooochow, Entropion congenital familial de la Paupifere inf4rieure. Vestnik Oftalmologn, 
pp. 820—830. 1914. Ref. Archives dOphtalmologie , T. xxxv. p. 126. Paris, 1916—17. 

Rohmer. De rAngio-megalie sym&rique des Paupikres sup4rieures. Archives d*Ophtabnologie, 
T. xx. p. 407. Paris, 1900. 

Robenbteik. Centralblatt f praktische Augenheilkunde , Bd. xxvi. S. 233. Leipzig, 1902. 

Schmidt-R iMrLER, H. Fett-Hernien der oberen Augeniider. Centralblatt f. praktische Augenheil¬ 
kunde, Bd. xxiii. S. 297. Leipzig, 1899. 

Schorr, Ilse. Ueber Entropium spasticum des Oberlides. Klinische Monatsbldtter /. Augenheil¬ 
kunde , Bd. lxxvi. i. S. 373. Stuttgart, 1926. 

Scrini. Un Cas de Blepharochalasis (Ptosis atonique. Dermatolysie palp4brale). Archives 
dOphtalmologie , T. xxvi. pp. 440—446. Paris, 1906. 

Shoemaker, W. T. A Case of Bilateral Enlargement of the Lachrymal Glands. Annals of 
Ophthalmology , Vol. xin. p. 613. St Louis, 1904. 

Sichel, J. Annales dOculistique , T. XII. p. 188. Paris, 1844. 

Stephenson, Sydney. Note upon a form of Congenital Trichiasis. Transactions , Ophthcdmological 
Society of U. K ., Vol. xiv. p. 13. London, 1894 

Stikrbn, Edward. Blepharochalasis. Report of two Cases. Annals of Ophthalmology , Vol. xxiii. 
p. 625. St Louis, 1914. 

Sziklai, Paul. Zur Frago des Entropium congenitum. Zeitschrift f, Augenheilkunde , Bd. xxxvm. 
S. 103. Berlin, 1917. 

Tebbon, A. Dermatolysie palpdbralo. Archives dOphtalmologie , T. xxiv. p. 346. Paris, 1904. 

Verhoeff, F. II. and Friedenwald, J. S. Blepharochalasis. Archives of Ophthalmology , 
Vol. li. p. 654. New York, 1922. 

Wagenmann, A. Ein Fall von doppelseitiger echlor Ptosis adiposa bei einem 16 jahrigen jungen 
Mann. 34. Vers, der ophth. Oesellschaft , S. 274. Heidelberg, 1907. 

Wkidler, W. B. Blepharochalasis. Journal of A merican Medical Association , Vol. LXI. pp. 1128— 
1132, Sept. 27. Chicago, 1913. 

Weinstein, A. tlber zwei oigenartige Form on des Herabhangons dor Haut der Oberlider: Ptosis 
adiposa mit spontaner Senkung der Tranendrti.se. Klinische Monatsbldtter /. Augenheilkunde , 
Bd. xlvii. ii. S. 190. Stuttgart, 1909. 

Weve. Een Gevael van Syndrom Ascher. Nederl. oogheelkundig Jaarboek 1921. Ref. Ascher, 
Klinische Wochenschrift , Bd. I. S. 2287 (Nov. 11), Berlin, 1922. 

Wirths, M. Beiderseitige Lidgeschwulst, kombiniert mit Geschwulstbilduug dor Oberlippe. 
Zeitschrift f. Augenheilkwnde , Bd. xliv. S. 176. Berlin, 1920. 

Yan Chow. Ueber Entropium dor Neugeborenen. Klinische Monatsbldtter f. Augenheilkunde , 
Bd. lxxv. ii. S. 162. Stuttgart, 1926. 




Ill, 



Biometrika, Vol. XXIV, Parts I and II Plate II 



F 'g- 2. Fig. 13. iv. 




.as*-* 


Biometrika, Vol. XXIV, Parts I and II 

C. II. Usher, Ihmilitary Entropion , etc. 


Plate III 





Plate IV 


Biometrika, Vol. XXIV, Parts I and II 

C. II. Ubher, Hereditary Entropion t etc. 



Fig. 11. V, 64, 


Fig. 12. v t 71. 




THE GEOMETRIC PROPERTIES OF MICROSCOPIC 
CONFIGURATIONS. I. GENERAL ASPECTS 
OF PROJECTOMETRY. 


By WILLIAM R. THOMPSON. 

(From the Department of Pathology, Yale University.) 

For several years attention has been directed in this laboratory to a study of 
the geometric properties of configurations too small to be studied conveniently with 
the eye directly. The procedure has involved the projection of images of plane 
sections of the configurations which are subjected to measurement. For such pro¬ 
cedure the name Projectometry is suggested. 

As an example of an application of Projectometry may be cited the estimation 
of the volume of an island of Langerhans in a pancreas. The method employed is 
similar to that used in the estimation of the volume of the hull of a ship from 
cross-sectional plans to scale. The areas of parallel plane sections at definite intervals 
along a given perpendicular axis are estimated by means of measuring the areas of 
their magnified images with a planimeter. With such an area as ordinate and 
position of the plane of section relative to the given axis as abscissa a point is 
located on a co-ordinate plane in each instance. If the cross-sectional area of the given 
configuration were so determined at all points along the given axis and the appropriate 
scale used in the co-ordinate system, then the area bounded by the curve so obtained 
and the axis of abscissae would be numerically equal to the volume of the configu¬ 
ration. In practice, however, we employ a trend line as an approximation to the above 
mentioned curve; the simplest device, technically, being what is called a fitted smooth 
curve, a curve drawn by inspection of plotted points as free as practicable from 
personal prejudice. Errors thus introduced may be estimated roughly in most cases 
in practice by sensibly independent attempts to perform the same operation. An 
error of this sort is ultimately involved in all graphic methods but may usually be 
restricted to a negligible magnitude. If convenient, any other scale of plotting may 
be adopted, replacing the above ordinates and abscissae by h x and h 2 times their 
value, respectively, and dividing the resultant area by h x .h 2 to obtain the volume 
mentioned above. The result is invariant with translation of the axis of ordinates, 
which may be employed, therefore, whenever convenient. Obviously, areas measured 
by planimeter must be converted to expressions in terms of the appropriate units; 
e.g., if the area under the trend line is so measured (called the Graph Area) the 
direct result of measurement is divided by a similar measurement of the area of 
a unit square of the co-ordinate plane, or in either or both instances the single 



22 General Aspects of Projectometry 

measurement may be replaced by the mean of several independent observations. 
In any given experience under discussion it is preferable that a uniform treatment 
be employed. 

In the following discourse the name islet will be used to designate a domain 
composed of a connected region in three dimensional space, possessing a finite 
volume, and its frontier. Obviously, the above-mentioned method may be applied 
to the estimation of the volume of such a configuration, provided that traces of 
images to scale can be made of sections of the islet intersecting a given axis 
perpendicularly at given intervals; or, at least, at intervals small enough to yield 
satisfactory approximation in the graph. 

Now, we may have under consideration a domain having the attributes of an 
islet as defined above, but of primary interest in a given discourse with respect to 
the islet or islets of a specific kind that it may contain. Such a domain will be 
called the organ and the islets of the specified kind which it contains will be called 
its islets in such discourse. In such systems (e.g. the pancreas and jts islands of 
Langerhans) many interesting properties may be studied *, it being assumed that 
the islets are discrete and finite in number as in the case cited, such as the arith¬ 
metic mean or other forms of average of the islet volumes, measures of dispersion, 
the distribution, aggregate number and total volume of the islets of the organ, 
volume of the organ and various ratios of these such as the ratio of total islet 
volume to organ volume and of aggregate number of islets to organ volume. Of 
course, a method of counting all the islets and measuring the volume of each might 
be devised; but the time and effort involved in such an undertaking might be pro¬ 
hibitive. In such cases the usual resort is to estimate certain of these quantities 
by means of similar data obtained from a random sample or samples of the whole, 
the function of these samples so estimated being chosen so that the appropriately 
weighted mean of values so obtained from mutually exclusive random samples 
converges to the value of the same function of the whole universe of discourse as 
the sum of the included samples approaches the universe as a limit. Now, by 
definition, a random sample of a universe of a finite number of things is a sample 
such that the a priori probability of being selected is the same for each individual 
in the universe. However, it is possible to obtain nearly if not quite as satisfactory 
estimates of functions of a universe from samples not necessarily having this property, 
provided that the various & priori probabilities of selection of the individuals 
actually selected in a given experience are known or at least their relative values, 
that the selection is uninfluenced by prejudice as to the final result, and that in the 
selection there is no individual in the universe whose probability of selection is zero. 
We may resort in such cases to a procedure of which the following is an illustration. 

Let U be a universe of n individuals, {-4<}, numbered from one to n; and let 

* Since the present article was written, the author’s attention has been drawn to two reports by 
Wick sell (Wioksell, 8. D., “The Corpuscle Problem,” Biometrika , Vol. xvii. (1925), pp. 84—99, and 
Yol. xvm. (1926), pp. 161—172), which deal, respectively, in a comprehensive manner with the special 
caseB wherein the shape of the islets (or corpuscles) is spherical or elliptical. 



W. R. Thompson 


23 


asi be a real number* corresponding to Ai, for i »1,. n. Then we define m, the 

arithmetic mean of the quantities {#,■}> hy 

2 Xi 

.(!)• 

Now, let us replace the ordinary notion of sampling by the following notion of 
set selection. 

Let individuals be selected from Z7, one at a turn, noting the result of each 
selection and then returning the individual to the same situation in regard to 
probability of selection at any turn as existed initially; i.e. let the probability of 
selecting any given individual of U be the same at each turn. 

Let pi , a constant, be the d prion probability of selecting the individual 
designated by Ai at any turn; and let /* be the number of times Ai has been 
selected in a set, S, of r turns; where r may be any positive rational integer and 

1,.. n. 

Then, according to previous specification, we require that 

Pi 0 for every i, and = 1 ...(2). 

Now in a set, S, of r turns, obviously, it may happen that for some value or 
values of i,fi** 0. Indeed, this is a necessity if r < n. 

In any case, let us adopt the following definition for M (x) , for which we shall use 
M as an abbreviation. 



Now, as we have defined pi as the d prioi'i probability of selection of A t at any 
given turn, we have 

lim <£* =p { .(4). 

Indeed, the assumption of the relation (4) suffices as a definition of p*. Accordingly, 
by dividing the numerator and denominator of the right member of (3) by r, we 
obtain by (1) and (3) 

lira .(5). 

r=oo 

This last property is necessary to any satisfactory estimate of the mean m, by 
sampling; but, as is the case with the mean of a random sample of U, here also we 
have no absolute knowledge of the error of the approximation. In either case we 
do not know how good is the convergence; but in either case we have the same 
resource, namely, to contrast several independent estimates (customarily not less 
than four), each being obtained according to the same technical plan. 

* Immediate extension to any system of magnitudes of uniform dimension (e.g. mass, volume, 
length, etc.) may be made, obviously. 








24 General Aspects of Projectometry 


It is of interest to note another property of M in relation to the same function 
of mutually exclusive subsets of the given selection experience. Let there be 
a dichotomous division of S into two subsets which wc shall call S' and S". Let 
us provide furthermore 


Then, obviously, 
Now, let 

and let 


that M', and r' be related to S 9 
and that M'\f" and r" be related to 8" 
as M t fi and r are related to S. 

and r' + r" = r.(6). 


a= 2 f{ Xx and 
1=1 Pi 


Pi 


.( 7 ); 


a 9 and b 9 be related to S 9 


and a” and b" be related to S" 


as a and b are related to S .(&). 

Then, by (3) we obtain 

M - a h , M' = j, and M" - ~ . (9), 


and, obviously, a 9 -f- a" = a and V + b" = b , whence we have 

VM' + b"M" 


M=- 


V + b n 


.( 10 ), 


which is the relation obtained if 8 9 and S 99 were merely subsets of S defined as 
a set of b 9 +6" numbers (or magnitudes), 8 9 and S" being mutually exclusive and 
containing, respectively, b 9 and b" of the elements of S y and M, M 9 and M" being, 
respectively, the arithmetic means of the elements of 8, S 9 and S", 

Now, there is another extremely valuable property of M which is at once obvious 
when we note that in (3) wherever / t * = 0 we need not know the value of p t - as it has 
been provided that pi £ 0. This means that the a priori probability of selection need 
be known only for those individuals actually selected, and indeed only their relative 
values need be known in order to calculate M (x) from the observed values of ay and 
fi in the experience, S. 

The above is by no means the only case wherein the failure of the ordinary 
method of random sampling to be readily applicable may be offset by an application 
of a knowledge of the a priori probabilities of selection of individuals. The function, 
ilf, defined in (3), is of like value in the case where more than one individual is 
selected at a turn, provided that the appropriate relative a priori probabilities of 
selection are used. 


We turn, now, to another feature of the study of the properties of islets in 
relation to an including organ in the general sense adopted above, namely, the 
estimation of population density and location of islets. Such a subject would be 
extremely difficult if not unmanageable were discourse not restricted to some 
arbitrarily designated principal point of an islet. Obviously, such a point should 
be defined uniquely. 








W. R, Thompson 


25 


For the present, at least, let us assign this r61e to the centroid of the islet. This 
point is determined uniquely and depends upon geometric properties only. In the 
case of an islet of uniform density, of course, the centre of gravity and centroid 
coincide. 

Accordingly, we shall define the population density of islets in a region by the 
ratio of the number of islet centroids lying in the region to the volume of the region. 
In general, where we may be dealing with a domain consisting of a region with 
part or the whole of its frontier, we shall say that an islet belongs to the domain if 
the domain contains the islets centroid; and let <f> be the population density defined 
as equal to the number of islets belonging to the domain under discussion divided 
by the domain volume (provided, of course, that the domain has a volume). It may 
be convenient to subdivide an organ into a set of mutually exclusive domains of 
this sort (having the whole organ equal to their sum), and in practice we may deal 
with the regional part of such domains, which have the same volume, and the 
manner of their choice for study can be such that the probability, a priori , of any 
islet centroids lying on the frontiers of the regions is zero (the actual incidence 
being neglected, accordingly). 

Furthermore, in practice, we often resort to the estimation of a function by 
means of another function to which it bears a known or perhaps a tentatively 
assumed relation. In the application of the foregoing to the estimation of islands 
of Langerhans per unit volume at least two substitutes may be utilised in place 
of centroid location. 

Consider the organ to be cut by a set of parallel planes at intervals of 10 micra, 
and let us agree to accept data obtained from a given section (domain equal to the 
part of the organ lying between two such successive planes) as indicative of the 
mean value of a given function (such as <£, the population density) in a given 
region; i.e. to regard the section as a fair sample of a domain including it. Let 
T be the total tissue volume of the section and N be the number of islets belonging 
to the section. Then, by definition 

N 

<*>=4.a 1 )- 

Now, T may be approximated by the product of the total tissue area in one of 
the planes of section and the distance (10 micra) between sections; but Nis usually 
a difficult number to ascertain, due to the difficulty of locating islet centroids with 
any satisfactory degree of approximation in many instances. As indicated above, 
however, there are two resources available. 

In the first place we may count the total number, H, of islets having any part 
in the section. This is simple if no islet has more than one discrete part in the 
section. Otherwise, we may estimate a correction to the total count of islet particles 
discrete in the section if the necessary data are available for a random sample of 
these islets. % 




26 


Now, let if be the number of sections in which any part of a given islet appears. 
Then we assume that the probability that the centroid of this islet lies in an ar¬ 
bitrary one of these sections is 1 /if. Accordingly, if the H islets be numbered from 
1 to if and r)j bear the same relation to islet number j in this set as does if to the 
islet mentioned above, then we may take N' , defined by 

j=Ji 1 

ff'- t l .(12), 

Vj 

as an estimate of N\ Of course, N' may in turn be estimated by means of a random 
sample of the H islets. Whether such successive approximations are admissible or 
not must be decided in each case separately. 

We have, however, an alternative method of estimating N t if it is admissible to 
assume that the difference, if any, between the probability that the centroid lies in 
the arbitrary section and that for any other point (uniquely determined) which we 
may substitute for the centroid is tolerable; i.e. we agree to tolerate resultant errors of 
the substitution. In this case the number, //', of the above-mentioned H islets lying 
at least in part in the section of reference, but not in a given adjoining section, may 
be taken as an estimate of N. Obviously, as an adjoining section may be chosen in 
two different ways we may obtain H" as the homologue of H f for the alternate 

TTf | TTft 

choice. Accordingly, either TT, H" or- - -may be taken as an estimate of N. 

Now, in a given domain let N be the number of islets belonging to the domain, 
v be their mean volume and 7 be the sum of their volumes. Let 7 equal the total 
volume of islet particles in the domain. This last requirement is special but it may 
suffice for estimates that it be merely approximated. Approximately in this case 
and exactly in the ideal case given above, then we have the relations 

I = N.v .(13), 

where 7 is the total volume of insular tissue in the domain; and if T is the (total 
tissue) volume of the domain, and (f>^N/T as before, and we let 8 = 7/2", then, 
obviously, 

$ = .( 14 ). 

Accordingly, the estimation of any two of the variables suffices to give an estima¬ 
tion of the third in either (13) or (14), and if we have an estimation of T then all 
these variables may be estimated. If all three variables in either equation may be 
estimated independently these relations furnish a means of checking the results. 

Summary . 

Some of the general aspects of the theory and practice of the quantitative study 
of geometric properties of microscopic configurations by means of projection have 
been discussed. Further elaboration is described in the succeeding communication 
together with applications. 






THE GEOMETRIC PROPERTIES OF MICROSCOPIC 
CONFIGURATIONS. II. INCIDENCE AND VOLUME 
OF ISLANDS OF LANGERHANS IN THE PANCREAS 
OF A MONKEY. 

By WILLIAM R. THOMPSON and RAYMOND HUSSEY, with the 
assistance of JOSEPH T. MATTEIS, WILLIAM C. MEREDITH, 
GEORGE C. WILSON and F. ERWIN TRACY 

Many attempts have been made to study geometric properties of islands of 
Langerhans in the pancreas; but, as is readily apparent in a critical survey of 
published results, the field is fraught with pitfalls to trap the unwary. One of 
these is the tacit assumption that, by proposed methods, random samples of islets 
in a region of the organ are obtained ; another is the failure to take proper account 
of the difference between counts of whole islets and counts of parts of islets which 
lie in a given section ; and a third is the laying down of criteria for estimation of 
islet volume which may lead to estimates for the same islet differing by many 
hundred per cent, dependent upon an arbitrary choice of direction for cross-section, 
or to similar differences in estimate of volume in the case of two islets of identical 
size and shape but different orientation with respect to direction of axis of cross- 
section, or to instances in which the dimensions whose measurement is required 
may not even exist. 

The first, apparently, to avoid these pitfalls was Bensley * By the nature of 
his methods, however, he was obliged to confine his attention to making counts of 
islets in tissue and estimations of the mass of the tissue in which they were con¬ 
tained. In this manner he was able to estimate the number of islets per unit mass 
in a pancreas as well as the total number of islets in the organ. As pointed out by 
Bensley*, however, the method he employed is subject to difficulty in application 
due to the fact that the staining technique is delicate (though yielding excellent 
contrast when correctly performed), and the time in which counts must be made is 
limited to a few hours. Clark f, in applying the methods of Bensley to a study of 
the human pancreas, encountered such difficulties. 

Ar stated in the preceding communication {, studies of the geometric properties of 
microscopic configurations, particularly volumetric relations of islands of Langerhans 
in the pancreas, have been conducted in this laboratory for several years. The 
previous paper was designed to serve as an introduction to subsequent reports 

* Bensley, R. R., Am . J. Anat . 1911—12, Vol. xn. pp. 297—388. 

t Clark, E., Anat Am . 1913, Vol. xun. pp. 81—94. 

$ Thompson, W. R.: see pp. 21—26 of this issue. 



28 Islets of Langerhans in the Pancreas of a Monkey 

and dealt with general aspects of the subject for which the name projectometry 
was proposed. The present purpose is to present an application of methods 
suggested therein to a study of the islands of Langerhans in the pancreas of a 
monkey, and to contrast the results obtained by two independent methods of 
estimation of mean islet volume in a given region. The data to be employed were 
obtained from observations made by certain students* working in this laboratory. 

Technical Procedures . 

Material for such observations was prepared in the following manner: 

The pancreas of a presumably normal monkey (macacus rhesus) under general 
anaesthesia was removed and placed without delay in a 10 % solution of formalin. 
Under formalin it was cut into twenty blocks and two small portions at either 
extremity, which latter were excluded in the present observations (the cuts being 
made approximately at right angles to the major axis of the organ). These blocks 
were numbered in the order in which they lay originally in the organ from one to 
twenty, number one block being that nearest to the extremity called the tail. 
From each of the odd-numbered blocks 120 serial sections were cut, ten micra in 
thickness, with a precision microtome, numbered in order of Section from one to 
120 and mounted, the sections so obtained being from a region of the block as 
near as practicable to the face of the original section of the organ which was 
nearest to the tail end of the pancreas. The odd numbered sections were stained 
with haematoxylin and eosin and covered with cover-slips in the usual manner. 
Accordingly, there were 60 sections so prepared for each of the odd-numbered 
blocks at approximately equal intervals along the major axis of the organ. 

The process of selecting material for observation was as follows: 

One section in each set of 60 (corresponding to an original block) was designated 
as a master or principal section of reference. For this purpose section #59 was 
taken unless it was found to be defective as to preparation to such an extent that 
proposed operations could not be effected, in which case a suitable section as near 
as possible to #59 was taken as master. In the experience to be reported #59 was 
satisfactory in eight of the ten instances. A chart was made of each master section, 
giving the location of each discrete region of insular tissue in it. A discrete part 
of an islet lying in a given section will be called an insular particle , for convenience 
in the following discussion; if the section is a master then the particle will be called 
a master insular particle ; and insular particles belonging to the same islet will be 
called associate particles. Accordingly, in conjunction with the chart for each 
master section , the insular particles belonging to it were enumerated. It should be 
noted that in such enumeration no significance was attached to whether two given 
master insular particles were also associate particles or not. This was deliberate 
and will be taken effectively into account later. The procedure was then to select 
by lot ten non-associate particles from each master section and to estimate the 

* Matteis, J. T., Meredith, W. C., Wilson, G. G., and Tracy, F. B. 



W. R. Thompson, R. Hussey and others 


29 


volume of each associated islet by the projectometric method previously suggested*, 
and to use the master section for total tissue and total insular tissue area measure¬ 
ments. Furthermore, in subsequent developments we designate by Z the aggregate 
number of insular particles in a given master section. 

In each instance tissue areas were estimated by means of the projection and 
tracing of a magnified image with a projectiscope, the measurement of the image 
area by means of a planimeter, and the estimation of the corresponding mean area 
magnification. For the case of perfect magnification (i.e. with no distortion) the 
area magnification is equal to the square of the linear magnification. The pro¬ 
jectiscope used in this work, however, is subject to distortion of images (inherent 
in its design). This distortion, however, is symmetrical with respect to the centre 
of the circular image field. 

The result of a short sequence of observations of mean linear magnification of 
a stra ight line segment of given length, situated so that one end, P 0 , of its image, 
P 0 Pi, is at the centre of the image field, is given in Table I and represented 

TABLE I. 


r in oms. 

p in cms. 

0-001 

0-587 

0-002 

1-175 

0-003 

1-752 

0-004 

2-320 

0-005 

2*915 

0*006 

3-502 

0-007 

4*085 

0-008 

4*672 

0-009 

5-272 

0-010 

6-870 

0-011 

6*482 

0-012 

7-095 

0-013 

7-712 

0*014 

8-342 

0-015 

8*980 

0-016 

9-630 

0-017 

10-272 


The values of p listed are the means of four 
observations, r is treated as if exact. 


graphically in Text-fig. 1 with a parabola fitted to the appropriately weighted 
observations of mean linear magnification by the method of least squares. The 
weighting is proportional to the square of the segment length (which is approxi¬ 
mately proportional to the precision of the measurement). We let r be the length 
of a given segment (as described above) and p be the length of its image, P 0 P t - 
(both in cms.). Then the linear magnification factor, / is given by 

/-?.<»■ 

Thompson, W, R., loc. cit , 





80 Islets of Langerham in the Pancreas of a Monkey 

and the above-mentioned parabola is given by 

f =» kt* 4* h .(2), 

where k and h are constants determined by the method of curve fitting indicated 
above. Now it may be verified readily that the curve defined by 

p=*kr*+hr .(3), 

if fitted to the original data (as given in Table I) by the simple (equal weight) 
method of least squares, gives exactly the same values to k and A as in (2); and 
(1), (2) and (3) then form a consistent set of relations. By such calculation the 
approximations, h co579*95 and k co8*201 (10) 4 .cm.~ a , were obtained. 

605 


600 


595 


5 590 


585 


580 


575 

0 2 4 6 8 10 12 14 16 16 

sl (IQ*) crrur* 

Fig. 1. 

However, the regions whose areas are to be estimated are irregular; and, 
accordingly, a strict application of the information thus obtained is quite difficult. 
In the case of islet measurements it is possible to place image areas so that their 
centroid is approximately in the centre of the itnage field and roughly approximate 
their areas by similarly placed circles whose area magnification would be /* if the 
point Pi lie on the circumference of the circle in the image. In the work to be 






W. R. Thompson, R. Hussey and others 


31 


reported at present f was replaced by an approximation, F, taken for r « 0*01 cm., 
and the area magnification in such instances was approximated by F\ A rough 
estimate of the error thus involved may be made as follows, by estimating the 
maximum error for the approximating circles’ magnification so introduced. 

From the data (given in the graph) we have that in the field 


679*9 603*7 and F = 588*15.(4). 

Now, let JP—/| f then 

E < 15*6.(5), 

and the maximum relative error, 

? ^ ^ jj* < 0*027 » 2*7 0 / 

F^ 5881 < 0027 27 '° # 


Accordingly, the relative error in estimating the magnification of these circular 
areas cannot exceed 5*5 %, and this is taken as an indication of the greatest errors 
to be expected in the present applications due to replacement of the variable / by 
the constant F in our calculations. A similar approximation was used in the case 
of total tissue area measurements. 

From the data obtained in the measurement of insular areas of a given islet in 
the successive odd-numbered sections a graph may be made by location of points 
in a Cartesian co-ordinate system having x , the distance in micra (/a) from an 
arbitrary point to the plane of section in a given instance, as abscissa, and y t the 
total area in square micra (/a*) of the cross-section of the given islet in that plane, 
as ordinate; and drawing a smooth curve (or a polygon) to represent these poiuts 
in the conventional manner. 


Now, let u be the volume of tissue in cubic micra (/a 8 ) to which a unit square 
in this co-ordinate system corresponds; K be the area of such a unit square and 
(in the same units) O be the area (of the connex) bounded by the curve mentioned 
above and the axis of abscissae. Now, let v be defined by 

v-u .(6). 


Then v is the estimate of the islet volume in /** which is sought. Obviously, instead 
of the co-ordinates used above, any pair (Ax, By), where A and B are known con¬ 
stants, may be used, whereupon the variables defined above are related by 


v 


uO 

ABK 


(7). 


The areas mentioned above are estimated by means of a planimeter in the 
usual manner. If a planimeter graduated in square inches be used it may be con¬ 
venient to adopt observed image areas as ordinates (By), then, if F* be the area 
magnification factor, we have 


F * 

' (2-54p 


.10 


-8 


B 







32 Islets of Langerhans in the Pancreas of a Monkey 

Obviously, the calibration of instruments is not to be overlooked; and correction 
factors should be applied to readings wherever their neglect would lead to errors 
not to be tolerated. In the work herein reported most of these elements have been 
examined, but attention has been focused primarily upon investigation of repro¬ 
ducibility and consistence rather than upon absolute reliability, although the only 
major element of the various technical operations involved not to be so examined 
is that of the reliability of the microtome. The instrument used, however, was of 
the type known as the precision microtome. The precision of the method of volume 
estimation by successive mechanical integration as described above is indicated by 
the following data. Sections 5 micra in thickness were employed instead of 10 
micra thick sections as specified above, but these were arranged into four sets of 
serial sections at intervals of 20 micra. Thus it was possible to obtain four other¬ 
wise independent estimates of the volume of the same islet by the method in 
question with this slight alteration. The results were as follows: 


Set No. 

v (10) _8 .ju~ 3 

Deviation 

1 

19*06 

1*62 

2 

17*35 

- *09 

3 

17*48 

*04 

4 

15*87 

-1*57 


Mean = 17*44; Mean absolute 1 )eviation = *83. 


These four observations, however, were made upon an islet of a human pancreas; 
but, in view of similarities of structure, the result is taken as indicative of what 
may be expected in the case of an islet of a monkeys pancreas. The a.d. is the 
mean absolute deviation from the mean of the observed values of v , of which it is 
approximately 4*8 °/ 0 . 

For a given islet (in accord with the definition given above *) let a be the number 
of associate particles in the corresponding master section, and let t? be the number of 
sections in the original set as cut (120 serial sections, 10 micra in thickness) in 
which any part of the islet appears. For 100 islets in the monkey’s pancreas (ten 
from each of ten blocks, as described) the estimates of v obtained by the method 
described above are given in Table II together with the corresponding values of an 
• , « 

approximation of - by the number of sections at intervals of 20 micra (the stained 

sections) in which any part of the given islet appears. In each instance in the 
present experience a = 1. 

We employ M {v) , a weighted mean of observed values of v, as an estimate of 
mean islet volume in a given region; the weighting being proportional to the 
reciprocal of the & priori probability of selection, in accord with the previous 
communication*. 


Thompson, W. R.,lloc. cit . 




W. B. Thompson, R. Hussey and others 


33 


In order to facilitate such estimation let us assume that we may proceed (with 
only negligible differences in approximations) as if the following ideal situation 
existed : 

The organ was originally sectioned by a set of parallel planes at intervals of 
10 micra, of which the sections actually obtained in the above-mentioned experience 
would be a part; and that the A priori probability p of selection of a given islet is 
the same as if the master section were chosen at random from the complete set, 
an insular particle of which was then chosen at random and the islet of which it is 
a part considered so to be selected. 


TABLE II. 


v in 

V 

v in 

V 

v in 

V 

v in 

V 

v in 

V 

10*. ^ 

2 

10*. (14* 

2 

104. pi 

2 

10 4 . /i* 

2 

104. & 

2 

(Bloc* #1) 

(Block #5) 

(Block #9) 

(Block #13) 

(Block #17) 

382*1 

13 

28-2 

4 

1350 

7 

110-7 

10 

227 1 

8 

99*3 

8 

392 9 

9 

24-3 

4 

18*6 

5 

17-8 

4 

24*5 

4 

192-8 

11 

22-3 

4 

8-1 

3 

49-1 

6 

12*8 

3 

19-7 

4 

26-7 

3 

4-1 

2 

17*2 

4 

47*1 

5 

7-9 

3 

49-5 

6 

44*3 

6 

26*6 

6 

204*1 

7 

32*0 

5 

53-2 

G 

13-4 

5 

12-3 

3 

32*9 

4 

228-7 

13 

232-1 

9 

8*7 

3 

9*8 

2 

142 

3 

1720 

8 

150-8 

8 

10-5 

3 

93-5 

7 

45*3 

7 

40-9 

6 

65*2 

7 

40-7 

6 

31-2 

5 

65*5 

4 

223*9 

7 

168-1 

7 

44-0 

5 

412*4 

10 

(Block $3) 

(Block #7) 

(Block #11) 

(Block #15) 

(Block #19) 

202*4 

8 

90 3 

5 

39-7 

4 

39*3 

5 

23-3 

3 

10*5 

4 

51-9 

5 

6-4 

2 

6-8 

3 

129*2 

8 

17-9 

4 

69-6 

6 

7*3 

3 

42-8 

5 

162-8 

8 

7*8 

3 

39-0 

3 

24-6 

6 

8-7 

2 

107'7 

9 

25-3 

5 

58*6 

7 

39*5 

4 

10-3 

2 

142 

3 

27-2 

4 

38-0 

8 

140*3 

7 

6-4 

3 

32-7 

5 

25‘5 

4 

48-3 

4 

43-7 

5 

25-3 

5 

63*9 

8 

148-6 

8 

116-0 

8 

84-6 

6 

10-3 

3 

796-5 

13 

153*8 

9 

24-2 

3 

571*2 

12 

11-8 

3 

7'8 

4 

382*6 

13 

40*5 

3 

205-9 

8 

10-4 

2 

830 

7 


Now, let 7) be the number of sections of the set in which a part of the given 
islet appears, a be the number of associate particles of this islet in a given section, 
and Z be the number of insular particles in this section. Then it can be demon¬ 
strated rigorously that there exists a number \ (the same for all islets in the set) 
such that 

.( 8 ), 

where the summation is over the complete set of sections, except where Z* 0, if 
ever. 

Biometrika xziv 


3 




34 


hlets of Langerham in the Pancreas of a Monkey 


However, the terras in this expression will all be zero except exactly t? of them. 
Accordingly, by the expression for the weighted mean as given in (3) of the previous 
paper* we have 

2 /.- 

M( V) = .(9) 

E- 

P 

(where the summation is over the entire number of islets in the organ and / is the 
number of times a given islet has been selected as described previously*), p need 
not be known wherever / = 0 and may be substituted for p, obviously with the 
identical result. Furthermore, in the actual experience/= 1 or 0 in every instance*)*. 

Now, if in (8) it is tolerable (as is assumed in the present experience) to replace 
2 - by the approximation rj, where we take the values of a and Z for some par¬ 
ticular one of the ij sections where a^O such as the master ; then we have for 
M( t) the expression 


st.'./ 

Mm- V — 

2 - ./ 
a. rj J 


.( 10 ), 


where the summation is over the islets actually selected. In the present experience, 
as has been stated, f=a = 1, and for the means for a given block , as Z is the same 
for the ten islets measured, it may therefore be eliminated from the formula. 
However, this is not permissible in the estimate of the weighted mean volume for 
the whole organ, for which, by (10) and Table II, we have (from the values of Z 
given in Table III) 

Jf(rt«S’66( 10)* V . (LI). 


TABLE III. 


Master 
Section of 
Block No. 

i 

(in 10 “.m : i ) 

T 

(in 10°. ft 3 ) 

Z 

u> 

N 

(approx, 
by /. w) 

5 

(9 

0 

(in islets 
per mm. 3 ) 

1 

8*00 

•287 

♦ 157 

•1052 

16-5 

*0300 

575 

3 

8-18 

*272 

134 

•0986 

13*2 

0301 

48-5 

5 

8-85 

*246 

128 

*0874 

11-2 

*0360 

45-5 

7 

5*01 

•250 

86 

T105 

9-5 

•0201 

38*0 

9 

7*97 

•264 

148 

•0916 

136 

0302 

51-5 

11 

713 

*335 

150 

•1109 

16*6 

•0213 

49*6 

13 

6*23 

•295 

119 

•1267 

15*1 

•0211 

51-2 

15 

7*28 

•343 

128 

*1716 

22*0 

•0212 

64*2 

17 

5-94 

•313 

116 

T118 

13*0 

*0190 

41*5 

19 

10*30 

•466 

289 

•0911 

263 

•0221 

56*5 

Totals 

75*49 

3071 



157*0 , 

&« = -0246, <fc,=51-l 


* Thompson, W. It., loc. cit. 

t The negleot of the probability of multiple selection of the same islet in this manner is obviously 
tolerable here. 







W. R. Thompson, R. Hussey and others 


36 


For the sake of brevity in the following discourse we let M it>) as estimated 
for a given block, the values obtained being listed in Table IY. 


TABLE IV. 

Mean islet volume estimates for ten regions of a pan erects of a monkey, 
contrasting two technical methods of estimation as described in the teat. 


Block No. 

V 

(in 10». fV) 

V' 

(in 10 fl V) 

V ■ V 
v+v 1 

V 

V' 

1 

0*01 

5*21 

•07 

M5 

3 

6*00 

6-20 

-*02 

0*97 

5 

91(5 

7-90 

07 

116 

7 

5-18 

5-27 

-•01 

0*98 

9 

7*37 

5* 80 

11 

1-26 

11 

6-57 

429 

21 

1*63 

13 

2*04 

413 

- -34 

0*49 

15 

1 '40 

3*31 

- 41 

0*42 

17 

5*35 

4-57 

•08 

1-17 

19 

8-28 

392 

•36 

211 

Arithmetic 

Means 

5-74 

5*07 

•01 

(Geometric 
Moan — 1 ’02) 


Now, let us turn our attention to certain other functions of this material. Let 
1 be the total volume of insular tissue in a given region, and let T be the total 
tissue volume in the same region; then we define 8 by 

.( 12 ), 

the ratio of insular to total tissue volume in the region. The observed values of /, 
T and 8 for the master sections of the ten blocks are given in Table III, where 
tissue volumes are estimated by the product of the thickness of the section (10/a) 
and the corresponding estimate of tissue area (projectometrically) in the section. 
In the same table are given the corresponding values of o>, defined as the mean 
reciprocal of rj for the ten islets of the block previously selected for volume 
measurement. 


As indicated in the previous paper*, we may approximate N, the number of 
islets belonging to the master sectionf (in the sense there defined), by Z.co; and 

<f>, the population density, by . The values of these variables are given in 
Table III also. 


* Thompson, W. K., loc. cit . 

t Neglecting the difference between Z end H (as o = l in each of 100 instances). 

a —2 






36 Islets of Langerham in the Pancreas of a Monkey 

By means of these data we may obtain an independent estimate of mean islet 
volume in these regions, V ', given by 

= . 08 ); 

9 Z .a) 

according to Equations(13) and (14) of the previous communication*. In Table IV 
are given the corresponding values of V 9 together with those of V and the functions 
of these variables defined by 

y _ y V 

r. r " 1 n I» r / 1 A \ 


D = Un 


V + V 9 ' 


and R = R ( 


The mean of the values of I) in the ten instances is 0*01 with mean a.d.=0l7; and 
the geometric mean of R is 102, which means approximate the ideal values (zero 
in the first and unity in the second instance) much better than it seems reasonable 
to expect in view of known inherent errors of the method. Indeed, it seems im¬ 
probable that another experience of this kind would lead to such close approximation, 
deviations of 0*05 or more from the ideal appearing more likely than not, although 
more precise results should be obtainable from more extensive investigations, 
wherein some of the roughness of successive approximations might be eliminated. 
As has been stated, however, the object of the present paper is to indicate methods 
by means of which such results may be obtained, employing data at present avail¬ 
able merely for illustrative purposes. 

It may be noted that the function, D {VtV ’ )t is merely the deviation of V from 
the mean of V and V 9 divided by this mean, i.e. 


V+V 9 


V- V 9 

v+v' 


and, obviously, Z) (r t v) =® — D {Vt v >). 

Now, in a manner similar to that in which V 9 is taken as an estimate of mean 
islet volume in the block regions we may obtain VJ as an estimate of mean islet 
volume in the organ which may be compared with the estimate M {v) given in (11). 

We obtain an estimate S 0 of the ratio of insular to total tissue volume in the 
organ by 

.(16), 

where the summation is over the ten values in each instance as given in Table III. 
Similarly we obtain <f) 0 , an estimate of population density for the whole organ, 
given by 

, % Z .(0 

<!>«- 22 T . 


* Thompson, W. R., loc. cit. 








W. R Thompson, R Hussey and others 37 


Then, in accord with previous procedure, we define TV, the estimate of mean 
islet volume in the organ, by 

. (18) * 


From the data the value, VJ = 4*81 ( 10 ) 5 . /z. 8 , was obtained, which approximates 
M m as given in ( 11 ) as closely as may Reasonably be expected in view of the 
roughness of the successive approximations (e.g. the approximation of rj by an even 
integer) and the great dispersion in islet volume which is obvious. Both are 
approximations of v , the mean of the volumes of all islets in the organ. 


Discussion. 

The roughness of the estimation of rj by an even integer need not be a feature 
of subsequent work; and, indeed, the approximation of \p in ( 8 ) by ~ .17 is not 

necessary as the exact value of \p = 2 ~ may be calculated in each instance if the 

several adjacent sections in which the measured islets have a part are examined 
for the data required under ( 8 ). Furthermore, we may use the method of estimation 
of N by count (as suggested in the previous paper*) of islets having a part in the 
master section but not in a given adjoining section or preferably by the mean of 
the two estimates possible thus by alternative choice of an adjoining section. In 
this manner we may obtain estimates of 8, (f> and mean islet volume v in a region 
from only three successive sections instead of 120 as here employed, only 60 of 
which were stained however, thus reducing the cost and labour of preparation to 
less than 5 % of that in the present work in this regard. 

However, studies of probable dispersion including class-frequency relations 
would not so be possible, but the treatment of this phase of the subject will be left 
for another communication. 

Were the “distances along an axis and the direction of sectioning planes at given 
points known (as could easily have been the case were the need appreciated at the 
time of cutting the blocks and sections) then it would be a simple matter to esti¬ 
mate total organ volume and total insular tissue volume by successive mechanical 
integration as in the case of islet volume ; and, indeed, to estimate the total number 
of islets in the organ. 

The danger of ad hoc calculations has been kept well in mind throughout the 
present work. The theory upon which calculations are based contains no arbitrary 
constants fixed by design to attain the end result, and was elaborated without 
regard to the result in this particular instance. A somewhat more crude theory, 
first evolved, led to almost exactly the same result as given here, the agreement 
between the corresponding alternate methods of approximation of mean islet 
volume for the whole organ being even closer. Subsequently introduced refinements 
improved the theory but altered the resulting estimates of mean islet volume only 
to a negligible extent in the present experience, which was as expected. 

* Thompson, W. R., loc. cit. 




38 


Islets of Langerham in the Pancreas of a Monkey 


Summary. 

Mothods have been presented and their application illustrated from material 
from the pancreas of a monkey (macacus rhesus) whereby v> the mean volume of 
islands of Langerhans in a given region, may be estimated as well as the mean 
number of islets per unit volume of containing tissue, and 8, the ratio of insular to 
total tissue volume. An alternative method of estimation of $ based upon islet 
volume measurement and selection probability estimates in the case of non-random 
samples is treated in like manner, giving agreement as good as is to be expected 
from the extent and precision of available data. 

Thus, the two respective estimates of mean islet volume in the pancreas of the 
monkey to which reference has been made were 4*81 (10) 6 ./i? and 5*66(10) 6 . jj ?; 
and the mean value of the ratio of insular to total tissue was found to be 0*0246, 
and the mean number of islets per cubic millimeter was approximately 51. 

Methods of estimation of total insular volume, total tissue volume and total 
number of islands in the organ are suggested although the necessary data are not 
available (in the present experience) for such calculations. 

The influence of distortion in projection has been studied and methods of 
restricting errors so introduced in microscopic area measurement have been 
presented. 



A BESSEL FUNCTION DISTRIBUTION. 


By A. T. McKAY, M.Sc, 

§ 1. Introduction . 

As the study of the distributions of the parameters of simple parent populations 
proceeds, the existence of new and often very complicated distributions is brought 
to light. These distributions, however, are sometimes only partially known in the 
sense that several of their positive moments or semi-invariants have, after much 
laborious effort, been determined. Naturally, the lack of explicit knowledge as to 
the precise functional expression of the distribution is not by any means entirely 
remedied by a mere fitting of the Pearson Type Curves, for there can possibly exist 
serious discrepancies between the fifth and higher moments of the fitting and the 
fitted curve. In view of this it would seem most desirable that the family of known 
distribution curves, i.e. curves whose functional form and all semi-invariants are 
known, which at present consist chiefly of the Pearson Curves, should be added to 
wherever possible. It is the purpose of the present paper, therefore, to bring to 
notice a most interesting Bessel Function distribution, the semi-invariants of which 
are all given in a surprisingly simple form. 

Further, it is known that the Pearson Type Curves do not always provide an 
entirely satisfactory fit to data, and it is therefore very useful to have at hand 
alternative fitting curves which can be tried on such experience. It is hoped that 
the Bessel Function distribution given herein may prove of service in fitting data 
which have Betas below the Pearson Type III line but which do not prove amenable 
to a fitting by a Pearson Curve. 

§ 2. The Distribution. 

The distribution which we are to consider is 

(*r/m | x/b | 

y = yoe-^l b .\x\ m or .(1), 

K m \xlb\ 

where I m and K m are the Bessel Functions of the second kind as defined by 
Watson*. The upper function must be employed when |c| >1, in which case the 
distribution curve extends from 0 to oo if c is positive or from 0 to - oo if c is 
negative. When |c| < 1 the lower function is to be employed, in which case the 
distribution curve extends from — oo to + oo . The quantity b is a positive constant 
and (m + £) > 0. The value of y 0 is given by 

|l-c*|«+i 

yo w * 2 m 6 m + l r(m + i). 

* G. N. Watson, Theory of Beesel T'unctione. 


(2). 





40 


A Bessel Function Distribution 


For the purpose of determining the moments of the distribution it is most con¬ 
venient to discuss the two cases separately, viz. 

Case (i): \c \ >1, 

Case (ii): |c| < 1. 

It should be noted that in equation (1) only the index of the exponential term 
changes sign with x. 

§ 3. Determination of Moments. 

Case (i)*. 

Consider the expression 

E — f e (a ~ c / h) x . x m . I m ( x/b ) dx . (3), 

Jo 

where a is a quantity entirely at our disposal and cjb is positive. Now 

I w (x/b) - I (x/2b) m+2r I F (r + 1). T (m + r +1).(4) ; 

o 

hence substituting in (3) we have after integration 

r(27n + 2r + l)/r(r + l).r(7w + r+l).(26)^.(c/fc~a) 2>w +* H - 1 


0 ... .(«)■ 

Now the Gatnma-Function duplication formula is 

2 2 *' 1 r (jr) . T (z +1) - wi T (2 z) .(6), 

and writing ^ = m + r + | in the latter we find 

2 3 " M2r r(m + r + J). I" (m 4- r -f 1) = iri T (2 m 4* 2r 4- 1) .(7), 

whence equation (5) reduces to 

£’ = (^.Ir(» + r + J)/r(r+l).(c-aJ)>^..(8) 

27T* o 

= (2&) w+1 . l'(m 4* i),/27ri ((c — ah) 2 — since |c| > 1 .(9). 

We conclude therefore that if fi r is the rth positive moment of the distribution 
about the origin, then fi r is the coefficient of a r r\ in the expression rryoE, that is 
in the expression 

Ai-[(1 ~<‘ a )/(l -(c-abfY+l .(10). 

Case (ii). 


Consider the expression 

E* = J e hx | x | m K m | x/b | dx = b m+1 j (e bhx 4- e~ hhx ) x m K m (x) dx , where 16//1 < 1 

.( 11 ). 

Now it can be shown f that 

.( 12 ). 


* The device used here of multiplying by tax and integrating or summing often proves most useful 
in dealing with distributions which have all the positive moments finite. For instance the method leadB 
to the following neat form for the semi-invariants for the Binomial Distribution (q+p) n . The rth semi¬ 
invariant ~ n . log (1 fpA). 0 r . Since the differences have been tabulated, or in any case are quite 
easily found, the formula is a specially useful form. + OK N. Watson, lac . cdt. 














A. T. McKay 


41 


.(14), 

.(15), 


•(16). 


where F is the usual hypergeometric function notation. Whence we have 

where z^(l + bh)/2 and F x (z) is the .F-terin of equation (12). Let us call the 
quantity in square brackets G and consider it thus: 

It has been shown f that provided 0< z< 1 that 
T (c — a). T (c - 6). P (a). r (b ). F(a , 6, c; z) 

=* T (c). P(a). r(6). T (c — a - b) . F(a, 6, a + b - c + 1; 1 - z) 

+ r(c).r(c~a).r(c -6).F(a + 6~c).(l-s) c - a - 6 

x F(c-a , c-6, c-a-6 + l;l-s) . 

whence substituting in this formula the appropriate values we find 

Fi (z) = A . z-t"*** - (1 - z) m +*F t (\ ~z) . 

where A^F(m + §). T(m + £)/ F(2m + 1) 

and F 2 (z) = F(2m + 1, 1, m + %; z). 

Now applying (14) to expand F 2 (z ) we derive 

F 2 (z) = - F 2 (\ -z) + A/[z(1 . 

So from (13), (15) and (16) we find that G = A and is independent of z. Whence 
Equation (11) reduces to 

E' = (2 tt* T (m + £) / 2 (1 - W)™+1.(17). 

Writing A = (a — c/6) and multiplying by y Q we finally deduce as in Case (i) that ji r 
is the coefficient of a r r ! in the expression E x given in Equation (10). 

§ 4. Semi-invariants and Criteria. 

We conclude then from Equation (10) that the rth moment about the origin of 
the distribution is 

d T 1 - c 2 

1 - (c - ]w>J o=0 . 

Whence if k r is the rth semi-in variant wc deduce 

. 

Hence we have Mean = .* = (2wi + l)6c/(« 2 —1).(20), 

a* — (2m + 1) 6 2 (c 2 4- l)/(c 2 — 1 ) 2 .(21), 

A = 4c 2 (c 2 + 3) 2 /(2»t + l)(c 2 + l) s .(22), 

& » 3 + 6 (c 4 + 6c 2 + 1 )/(2 m + 1) (c 2 +1 f .(23), 

Criterion = 12 (c 2 -1 ) 2 /(2m +1) (c 2 +1 ) s .(24). 

From Equation (21) we see the necessity for the restriction that (m +1) > 0, for the 
standard deviation must be positive. 

t E. T. Whittaker and G. N. Watson, A Course of Modem Analysis. 


.(18). 


.(19). 














42 


A Bessel Function Distribution 


§ 5. Properties of the Distribution. 

Writing 8 —(& —3)/#! and c a — t , we find from Equations (22) and (23) 

2*(t + 3) a 8-3(* f l)(* 2 + 6* + l) = 0.(25), 

which, when 8 is given, is a cubic equation to find positive values of t Further, 
from Equation (24), the criterion tc x is essentially positive, hence the distribution 
is leptokurtic and 8 ^ 1*5. The lines /3 2 = 1 ‘5 fix + 3 (the Pearson Type III Line) and 
£ 2 = lf>7735)8i + 3, which can be called the “Bessel Line/’ are thus critical lines 
for the distribution. Unfortunately, these lines are too close together to be shown* 
conveniently on a scaled diagram, but the following diagram shows their significance, 
besides indicating the occurrence of the roots of Equation (25). For a given value 
of 8 the smallest positive root of (25) is k* the medium root and the largest 



Table I gives the values of t 2 and k for a series of values of 8 corresponding 
to the overlapping K-I region, while Table II gives the value of k (the only root) 
corresponding to the K region. 

For values of 8 greater than 3*00, the approximation 

k = * 0 - (68 - 7)/{(68 - 7) a + (48 - 7)}.(26) 

can be used. The error in using this formula is less than one in a thousand, though 
if greater accuracy is required the iteration formula 

3 fan-1) 2 

( 2 *-»){*■+§{ . (27) 

readily produces the result to any degree of accuracy. 

It may be noted that several well-known distributions can be obtained as special 
cases by choosing the constants suitably. 

The Normal Distribution . Put c = 0, then, after writing (2m + l)/!> 2 = <r a make 

6 = 0. 






A. T. McKay 


43 


TABLE I. 

Positive roots of the Equation (25) for values of 8 between 1500 and 1-57735 

at intervals of 0*002. 


8 


h 

*» 

1-600 

1*00000 

1-0000 

00 

1-602 

*86837 

1*1641 

741*97 

1*504 

•82119 

1*2445 

366*93 

1*506 

•78744 

1*3126 

211 -90 

1*508 

*76050 

1*3746 

179-30 

1*510 

*73784 

1*4334 

141*83 

>*512 

*71817 

1*4903 

116*79 

iT>14 

•70072 

1*5461 

98-890 

1*516 

*68501 

1*6014 

85-464 

1*518 

*67069 

1*6565 

75*006 

1*520 

•65753 

1*7118 

66-031 

1-522 

84536 

1*7677 

59*769 

1*524 

•83399 

1 *8284 

54*042 

1 *526 

•82336 

1*8816 

49*187 

1*528 

•61336 

1-9401 

45*018 

1-530 

•60392 

2*0000 

41*396 

1 *532 

•59498 

2*0614 

38-219 

1*534 

•58648 

2*1246 

35*407 

1*536 

*57839 

2*1897 

32*899 

1*538 

•57067 

2*2571 

30*646 

1*540 

•56328 

2*3270 

28*610 

1*542 

•55620 

2*3996 

26*758 

1*544 

•54941 

2*4755 

25-066 

1 *546 

*54287 

2-5548 

23*511 

1*548 

*53658 

2*6382 

22*075 

1 *550 

•53051 

2*7261 

20*743 

1 *552 

•52465 

2*8192 

19*502 

1 *554 

•51899 

2-9183 

18*341 

1*556 

*51352 

3*0242 

17*248 

1*558 

•50821 

3*1382 

16*216 

1*560 

*50307 

3*2619 

16*235 

1*562 

•49808 

3-3971 

14*298 

1*564 

•49324 

3*5467 

13 398 

1 *566 

*48853 

3-7145 

12*524 

1*568 

*48396 

3*9062 

11*669 

1 *570 

•47950 

4*1309 

10*818 

1*572 

•47517 

4-4051 

9*953 

1*574 

*47094 

4*7634 

9*036 

1*576 

*46682 

5*3142 

7*956 

1*57735 

*46410 

6*4641 

6*4641 


Pearson Type III Curve . Write 6/c» constant and then make c = oo. Another 
case arises by writing 6 = (c 2 -l)x a constant and then putting c= 1. 

The First Product Moment Coefficient in Samples of n drawn from an indefinitely 
large Normal Population. 

Writing c=» — p t 1, and x±v } we derive the distribution as found by 

Pearson, Jeffery and Elderton*. In this most interesting paper the first four 
moments and the resulting criteria have been determined. Since the curve for 

* Karl Pearson, G. B. Jeffery and Ethel M. Elderton, Biometrika , Vol. xzi. 




44 


A Bessel Function Distribution 


TABLE II. 

The positive root of the Equation (25) for values of 8 between, 1'58 and 3'00 

at intervals of 0'02. 


8 


8 


1*58 

0*45889 

2*30 

0-14046 

1-60 

•42418 

2-32 

•13802 

1*62 

•39577 

2*34 

•13568 

1*64 

•37184 

2-36 

•13341* 

roe 

•35128 

2*38 

•13122 

J*0S 

•33333 

2-40 

•12910 

1*70 

•31747 

2*42 

•12706 

172 

•30330 

2*44 

•12508 

1*74 

•29054 

2*46 

■12316 

1*76 

*27897 

2-48 

•12130 

1-78 

•26841 

2-50 

•11950 

1-80 

•25872 

2*52 

11776 

1-82 

•24980 

2*54 

•11606 

1*84 

•24154 

2-56 

•11442 

1*86 

•23380 

2*58 

•11282 

1*88 

•22671 

2*60 

•11127 

1*90 

•22003 

2*62 

•10976 

1-92 

•21376 

2*64 

•10829 

1-94 

•20787 

2*66 

•10686 

1*96 

•20232 

2-68 

•10547 

1*98 

•19709 

2*70 

•10412 

2*00 

•19214 

2*72 

•10280 

2*02 

•18745 

2*74 

•10152 

2*04 

•18300 

2*76 

•KK)27 

2 -on 

*17878 

278 

•099045 

2*08 

•17475 

2*80 

•097854 

2*10 

•17091 

2*82 

•096692 

2*12 

•16725 

2-84 

•095559 

2*14 

•16375 

2-86 

•094451 

2* 18 

•16040 

2*88 

•093370 

2-18 

•15719 

2*90 

•092314 

2-20 

*15412 

2*92 

•091282 

2-22 

•15117 

2*94 

•090273 

2*24 

•14833 

2*96 

•089287 

2*20 

*14561 

2*98 

•088323 

2*28 

i 

•14299 

3*00 

•087378 


Case (ii), that is |c| < 1, is but slightly more general (for to is not more restricted 
than that (to + 4) > 0) than that considered by the above authors, the reader is 
recommended to refer to this paper, where he will find a selection of curves and a 
detailed discussion of some of their characteristics. 

The Distribution of the Mean of Samples of n drawn from an indefinitely large 
Exponential Distribution, the equation of which is given by 

y-l&e-*' 1 * 1 . 

Write c - 0, r» + i=n and b=l/nh?. It may be noted that in this case the 
Bessel Functions degenerate into a rather simpler form, since the order is integer 
minus a half. 




ON THE NORMALITY OR WANT OF NORMALITY 
IN THE FREQUENCY DISTRIBUTIONS OF CRANIAL 

MEASUREMENTS. 

By E. M. ELDERTON, D.Sc. and T. L. WOO, Ph.D. 

(1) It is well known that the general appearance of the distributions of 
anthropometric characters led Quetelet and afterwards Galton to the assumption 
that such characters were normally distributed. The normal curve was then 
introduced into anthropometric discussions, and became almost as much a fetish in 
anthropometry as in the theory of astronomical observations. 

It is, however, undoubtedly true that a number of anthropometric characters, if 
taken in not too large, samples , roughly obey the normal law. With larger and 
larger populations the deviation becomes more and more obvious. 

Some recent craniometric investigations suggested that the approximate 
normality of a considerable number of anthropometric characters might be due to 
such characters depending upon a variety of elements of growth, and when the 
number of such elements was reduced there might be a distinct weakening in the 
normality of distribution. Thus while the stature of adults in samples of, say, a 
thousand may be approximately normal, cubit is less so, and head length or breadth 
still less so, although these latter cover several bones of the skull. This is more or 
less in accordance with the idea that the normality of a distribution arises from the 
action of a multiplicity of a large number of contributory causes each supplying a 
small amount of the variation. 

(2) It seemed possible to test this conception on a number of measurements on 
the bones of the skull made by Dr T. L. Woo. The measurements taken are those 
referred to in Dr Woo's paper “On the Asymmetry of the Human Skull/'* The 
distributions of 50 measurements on single bones of the skull were found and the 
fti (or V/Ji) and of these fifty cases were computed by Dr Woo. The problem 
therefore arises as to whether these 50 values of V& and /3 2 can be reasonably 
treated as those of random samples from a system of normally distributed variates. 
Unfortunately Dr Woo's /S’s suffer from two defects when used for our present 
purpose: (i) They are obtained from populations which vary in size from 451 to 
887 individuals, the average size being 777, much nearer the upper than the lower 
limit; and (ii) they are for 25 homologous characters measured right and left. 
There is a high correlation between the y8's for the characters on the homologous 
bones, which in itself seems to indicate that the /9’s are not random deviations from 
the normal values (0, and 3), but depend upon something characteristic of the bone 
on which the homologous measurements are taken. The correlation of the right and 

* Biometrika , Vol. xxn. p. 824 et *eq. 



46 


Normality of Cranial Measurements 

left V/8is is *8474 and of the right and left /3/s is *6258. These correlations between 
left and right skewness and kurtosis seem in themselves to indicate that skewness 
and kurtosis are individual to the bone, and that the general system of ft* as found 
by Dr Woo cannot be treated as due to random sampling. Thus the second defect 
in the system of measurements is not without compensating advantage. The first 
defect appeared originally to be more serious than it really is. The distribution 
constants of V)9i and /8 a on the hypothesis that they resulted from random sampling 
from normally distributed populations were first determined for the average size 
777 of the measurements, and the distributions of the observed £'s were found to be 
impossible on this hypothesis. But this left an opening for confirmed believers in the 
universality of the normal distribution to assert that the impossibility arose from 
the samples being of various sizes. Accordingly an attempt was made to correct for 
this variation in size. 

(8) The variation of the distribution ctirves of \//3i and P 2 $s depending on the 
size of the sample is of two kinds: (i) that which depends on the size of the curve 
or on its standard deviation. This difficulty could be got over by reducing each (3 to 
a common standard deviation, i.e. by multiplying it by the ratio of the standard 
deviation of a sample of 777 to its own sample s.D. This of course will modify the 


distributions of/3 2 



Diagram I. 







E. M. Elderton and T. L. Woo 


47 


ordinates of the curve, so that the y* must be multiplied by the inverse ratio to 
maintain the constant area which must in our case be equal to 50 units, (ii) Further, 
not only does the dimension of the distribution curve depend upon the size of the 
sample, but its shape does so also*. The problem accordingly arises as to how far 
after change of size in (i), the changes in shape will be of great or small importance. 

Diagram I shows the histogram of the uncorrected observed values of /8*, 
and three curves of distribution of on the assumption that n =» the mean 
value 777, the minimum value 451 and the maximum value 887. The mean of the 
observed values is shown, but to avoid confusion only the mode and mean of the curve 
of mean value. It will be observed that there is considerable separation of the 
three curves of 6% distribution, but it is clear that no curve in the region in which 
they lie would fit tha observed values. Diagram II shows the data corrected by 
the method indicated, and the three curves reduced to the same standard deviation. 
The curves are not absolutely of the same shape, although they have the same 
s.D. and area; but they are so close together, and any curve lying between them, 
i.e. with a value of n between 451 and 887, will be so near to them, and all three 
so near to the mean curve for n = 777, that we may fairly take that curve as the 
theoretical distribution, which the corrected /Sa’s ought to follow. 

DISTRIBUTIOMS oP reduced toCotrunmMzan^ 

Curve token n is mean value 777 



Z3 2*5 2*7 31 33 35 31 3<t 4*1 43 

A 


Diagram II. 

* The By , Bo of *//9, and * r * functions of n, the size of the sample. 




48 Normality of Cranial Measurements 

Even a cursory inspection of the diagram suffices to indicate that the are 
not a reasonable sample from this theoretical curve. 

(4) Turning now to its distribution follows a symmetrical curve and the 
character of this curve largely depends upon J5 2 This takes the values: * 

n = 451, ?i= 777, n = 887, 

B b (Vft) = 3*075,703 - 3*044,926 « 3*039505 */ 

Thus we see that the distribution of V/3i will closely follow a Type VII, but 
it will be very near to a normal distribution for all three cases, and accordingly it 
will be adequate to adjust the standard deviations to a common standard, namely 
that of n = 777. This reduction has been done and the modified histogram of the 
observed values of fitted to the appropriate curve in Diagram III. 



Diagram III. 


It is clear from Diagrams II and III that it is impossible to look upon 
Dr Woo’s 50 values each of V& and # 2 as corresponding with 50 random samples 

See note on p. 47. 





E. M. Eldertoji am> T. L. Woo 


49 


from material following the normal law of distribution. The conclusion is in¬ 
evitably forced upon us that the normal curve is inadequate to describe effectively 
the distribution of measurements taken on single bones of the human skull. 
Such distributions show a defect in platykurtic and an excess in leptokurtic 
distributions from what would arise in samples from a normally distributed material. 
Furthermore the skewness shows a marked trend in the positive sense, that is to 
say the mean more frequently takes a higher than a less value than the modfe; this 
is not consistent with random sampling from a normal population. It may be linked 
with the feet that variations in defect of the modal value must be largely limited 
in the case of small bones *. Having indicated the general results of these 
investigations, we may now point out the analysis by which they have been 
obtained. 

(5) The values of •J fa and fa for the 50 series of measurements with the 
numbers in each series are given in Table I. The first column gives the index 

F 

TABLE I. Dr Woos Values for V& and fa. 


Bone 

Measurement 

Unadjusted 

Adjusted Jfa 

Unadjusted ]9 2 

Adjusted 

Bight 

Left 

Bight 

Left 

Bight 

Left 

Bight 

Left 


+ •176,636 

+ •136,190 

+ •188,324 

+ *145,286 

3-386,011 

3*459,867 

3*40848 

3-48550 

K 

+ *158,138 

+ •128,279 

+ •168,883 

+ •136,996 

3*206,534 

3-013,689 

3-21817 

3-01395 

i\ 

+ *037,597 

- *004,648 

+ •037,043 

- *004,579 

2-980,196 

3*359,173 

2-98072 

3*35076 

A3 

+ •028,811 

- -042,832 

- -269,368 

+ •030,874 

- *045,899 

2-982,362 

3*012,825 

2*98067 

3-01305 

Pa 

- *094,144 

- -091,774 

- '262,588 

3-413,815 

3-179,417 

3*39949 

3-17328 

0 , 

+ •236,230 

+ •106,876 

+ *246,001 

+ •112,653 

3*077,681 

3-056,409 

3*08067 

3*05845 

0 , 

+ *398,165 

+ •268,491 

+ -418,280 
+ •091,739 

+ -282,066 

3*623,313 

2*953,096 

3-64898 

2*95070 

o. 

+ *087,327 
+ •093,172 

+ •160,650 

+ •168,766 

3-189,819 

3*424,293 

3-19733 

3*44163 

£ 

+ *099,974 

+ •098,327 

+ -105,506 

2-954,450 

3*121,385 

2*95187 

3*12654 


- *006,726 

+ *042,595 

- *007,053 

+ •044,662 

2*857,046 

2*983,474 

2*85079 

2-98235 

T , 

+ *149,535 

+ *152,579 

+ *158,268 

+ •161,490 

3*044,128 

2*997,727 

3*119,839 

3-04582 

3*12527 

Ti 

+ *042,170 

- -165,341 

+ •040,777 

-•150,210 

3-446,918 

2-99802 

3-42241 

% 

+ •121,357 

+ •106,378 

+ •128,802 

+ •112,904 

2*930,373 

2*958,346 

2*92620 

2*95669 

5 

ft 

+ *060,248 

+ •049,088 

+ •062,746 

+ •051,123 

3*017,157 

3*131,634 

3-01723 

3*13418 

-•031,082 

- *059,960 

-•032,311 
+ •009,786 

- *062,331 

2*850,937 

3*082,440 

2*84608 

3*08456 

MU 

+ •009,546 

+ •002,617 

+ •002,683 

3-571,235 

4-185,434 

3*58032 

4*20456 


- *290,128 

- *277,429 

- -279,004 

- *266,792 

4*267,608 

3*954,612 

4*20751 

3*90941 

Mxi 

-•206,880 
- -077,979 

- *199,216 

- -195,342 

-•188,106 

3*148,294 

3*273,137 

3*14064 

3*25864 

lfa’2 

+ •012,537 

- *063,731 

+ •010,246 

3*413,535 

3*197,307 

3*34125 

3*16376 

ira? 3 

+ •051,303 

+ •044,662 

+ •039,195 

+ •034,122 

2*919,089 

2*996,785 

2*94039 

3-00003 

Mx 4 

+ •246,201 

+ •257,146 

+ •206,540 

+ *215,722 

3*746,382 

3*485,451 

3*62929 

3*40983 

fa 

+ *145,442 

+ •130,073 

+ •139,959 

+ •125,169 
+ •070,762 

3*337,501 

3*078,842 

3*32202 

3*07540 

fa 

fa 

- -066,375 

+ •067,052 

-•068,992 

3*316,481 

2*930,254 

3-33045 

2*92659 

+ •039,711 

+ •135,358 

+ •038,294 

+ •130,627 

2*760,055 

3*018,663 

2*77095 

3*01805 

fa | 

+ •112,832 

+ •134,687 

+ •115,054 

+ •137,339 

2*868,398 

2-994,103 

2*86674 

2-99383 


• # Of. the positively skew curves obtained in the oases of age at soarlet fever attack, interest on 
securities, or period after marriage at whioh divorce proceedings are taken, etc., where there is a definite 
limit close to and below the modal value. 

Biometrlka xxnr 


4 




60 Normality of Cranial Measurements 


letter of the measurement, the second the number of measurements in the paries, 
the third the observed value of with its sign following that of /u*, and the fourth 
gives the observed value of &%. 

The following formulae have been used in considering the distributions of 
and /3 t as samples from a normal population. They presuppose the samples taken 
from a normal population to be of constant size, a condition not fulfilled by Dr Woo’s 
material as explained above. 

Mean JWi — 0, V. = \/Cn T ^^S)’ 


ft (Vft) = 0, ft (Vft) = 3 + 36 


Mean ft « 3 


n— 1 
n+1 ’ 


< 7 3 ,= 


(n 

V24 


(n — 7) (n* + 2w — 5 ) 

2) (n 4 5) (n + 7) (m + 9) 


•(')* 


'24 /n(n — 2) (n — 3) 
+ 1V 


(a + 3) (n + 5) 


„ 216 ( n + 3)(» + 5)(m* — 5m + 2) 

1 W n (n — 3) (n - 2) (» + 7)* (n + 9)* ’ 

B _ 36 15»« - 36m® - 628m 4 + 982m* + 5777m* - 6402m + 900 

iW- + n ( n — 3)(n — 2) (m + 7)(ra + 9)(m + U)(m +13) 

.(ii)*. 


For Dr Woo’s material the minimum number of measurements in any series 
was 451, the maximum 887 and the average 777. 


The above constants for those three cases are as follows: 



451 

777 

887 

Vft 

02 

JA 

A 

s/ft 

ft 

Mean . 

Standard Deviation 

£::: ::: ::: 

! 

0 

•11458 

0 

3*075,703 

2-98673 

•22689 

•449,323 

4-102,966 

0 

•08764 

0 

3*044,926 

2*99229 

•17406 

•267,851 

2*662,507 

0 

*08197 

6 

3 039,505 

2*99324 

*16311 

•235,714 

3*583,784 


all three distributions can be described by Type IV curves in the case of ft and by 
Type VII curves which are fairly close to the normal curve in the case of Vft. 

Dealing with Vft first the theoretical distribution for n»=777 is the curve 


/ \ -< 7176,031 

y - 4-5826 (l + .(iii), 

compared with the normal curve we have the following subrange frequencies: 

* See Biometrika , VoL xnx. p. 428. 






E. M. Elderton and T* Im Woo 


51 


Subrange Frequencies of Normal wad Type VII Curves. 


s/fi\ (Central Value) 

Normal Curve 

Type VII Curve 

Observed Values 
reduced 

-•40 

•001 

*6ox 


-•35 

•004 

•006 

— 

-•30 

*037 

•040 

1 

-•25 

•212 

•218 

2 

- *20 

•886 

•873 

2 

-•15 

2-692 

2*004 

1 

-*10 

5*956 

5*924 

1 

-•05 

9-591 

9*822 

5 

0 

11-240 

11*301 

5 

+ •05 

9-591 

9*622 

10 

-r *10 

5-950 

5*924 

6 

+ •15 

2*092 

2*664 

11 

+ •20 

•886 

•873 

3 

+ •25 

*212 

•218 

1 

+ •30 

•037 

•040 

1 

+ •35 

•004 

•006 

— 

+ •40 

•001 

•001 

1 

Actual Total «= 50 

49*998 

49*997 

50 


It will be evident from the above values that for our present purpose Type VII 
curve is equivalent to a normal curve, and that accordingly we can legitimately 
pool our */j3i s after the modification due to multiplying them by the factor 
and compare their distribution with the curve (iii). This is done graphically in 
Diagram III: see p. 48. The discrepancy even for a sample of 50 is marked. The 
difference of mean and standard deviation from the observed values are as follows: 

Theoretical Value Observed Value 
Mean: 0 *04897 

Standard Deviation: *08754 *14112 

Only 14 out of the 50 V&’s are negative and the variation is greater than the 
theoretical value by more than 50%. If we apply the test of goodness of fit we find 
100*058 and P < *000,0005 for 11 groups. Remembering, however, that V/^'s 
of the left and right sides are highly correlated, one might reach another limit 
by supposing we had only half the above frequencies, in which case x* wouldjxjual 
50*029, but we still find P< *000,0005. We thus see that the values of *//3x are 
incompatible with random sampling from normal material. 

In regard to the correlation of the V /3i s for the homologous measurements left 
and right, we have the following constants, when we work with Dr Woo's un¬ 
corrected data. 


Mean 

Standard Deviation of V/3 X : 


Left 

*041,057 

*140,847 


Right 

*056,880 

*140,951 




52 


Normality of Cranial Measurements 

Correlation of Right and Left V/9is — *8474. 

The mean of all fifty uncorrected values = *048,969 and their standard deviation 
= 141,121. 

This correlation in itself is a strong argument against the distributions being 
merely random samples from normal populations. 

( 6 ) We now turn to the distribution of the $ 28 . Here we may determine in 
the first place the correlation between homologous /3 2 ’s. We find for the uncorrected 
values: 

Left Right 

Mean &: 3*216,696 3*194,405 

Standard Deviation of *304,413 *339,406 

Correlation of Right and Left £ 2 = '6258. 

Now formula (ii) and the results in the table below it indicate that the mean 
A, whatever n may be, is less than 3, while the standard deviation for the shortest 
series is less than *25. Hence it does not look even when the /Sg’s are adjusted as if 
we shall have a distribution corresponding to what may be anticipated in samples 
from normal populations. The correlation between the s for the distributions of 
measurements on homologous bones confirms what has been observed with regard 
to the correlation of the V&’s, although the correlation coefficient is not so high, 
i.e. both emphasise that the skewness and kurtosis of an individual measurement 
are something peculiar to the bone on which the measurement is taken and are not 
due to the variations characteristic of random sampling. 

In order to test the effect on f3 2 of the variation in the sizes of the series, curves 
of Type IV were determined by the constants of the theoretical distributions for 
totals of 451, 777 and 887. The equations to these curves are as follows for the case 
of 50 samples: 

(a) 451. 

a = *712,387 tan 0, y «*2393855 cos 20 * 667 ’ 738 

Origin at 2*357,179, Mode = 2*925,479, 
and as already noted: Mean =* 2*986,726. 

(i b) n- 777. 

x = *687,983 tan 0, y = *0316555 cos 29 * 663 ’ 232 ^^ 423 * 048 * 09 ^. 

Origin at 2*416,795, Mode « 2*953,343, 
and as already noted: Mean = 2*992,288. 

(c) n = 887. 

x = 683,293 tan 0, y = *01662195 C os 32B98 « 663 00 + 2B ' 354 ^. 

Origin at 2*427,065, Mode = 2*958,507, 
and as already noted: Mean = 2*993,288. 



E. M. Eldebton and T, L. Woo 


53 

The ordinates of these three curves were computed and they are plotted in 
Diagram I along with the original unadjusted values of 0 2 . The “mode” and “mean” 
are those corresponding to curve ( b ). It is clear that size of sample largely influences 
the form of the curve, and although the results are very improbable on the basis of 
any one of these curves, it is desirable to adjust the 0%s so that we may have a single 
curve for purposes of comparison. Now in the case of V 0 t all the corresponding 
three curves were so nearly normal that we had little hesitation in adjusting our 
*j0i s by merely altering their values in the ratio of to *<7*/^. We have 

attempted a similar method in the dase of 0 2t but A priori we cannot be equally 
confident of success, because the variation in n changes the shape as well as the 
size of the curve. Diagram II (p. 47) shows the distribution of the observed 0%& 
adjusted to a common mean and standard deviation. This is done in the following 
manner. In the place of n 0 2 we take 

777^2 + (J3* - 

and it is this value we have termed “the adjusted value” of the observed n 0 2 . 

A similar treatment has been applied to the abscissae of the three curves (a), 
(6) and (c). This obviously does not change curve ( b ) at all, but it changes both 
(a) and (c). In the former case the abscissae are stretched in the ratio of 2*99229 to 
2*98673, and in the latter case squeezed in the ratio 2*99229 to 2*99324. In both 
cases the ordinates or y 0 ’s must be adjusted in the inverse ratio to retain the con¬ 
stant 50-units area. Diagram III (p. 48) indicates that while the three curves are not 
by this process brought into absolute agreement, they are sufficiently near to one 
another to justify our using the theoretical curve for n = 777 to describe the adjusted 
observed 0 2 a, on the supposition that the originals arose from sampling from normal 
populations. It is hardly needful to emphasise the badness of the fit! 

The theoretical mean and standard deviation are 2*99229 and *17406, while 
the observed are 3*19200 and *31718. Thus the system of 0 2 h is more leptokurtic, 
and far more variable than is to be anticipated in sampling from normal populations. 

The comparison of the observed and theoretical frequencies may be made from 
the table on p. 54. 

The for 10 groups is 266*869, which gives a probability of less than *000,0005, 
and even if we suppose the 0% s from right and left sides perfectly correlated the 
X 2 of 133*434 would also have a probability less than *000,0005. 

(7) The general conclusion to be drawn from Dr Woo’s 50 */0i & and 50 0 2 & 
seems to be that their distributions differ markedly from such as would arise from 
random sampling on the assumption that the distributions were each sampled from 
a normal population. We are accordingly forced to the important conclusion that the 
distributions of characters measured on the individual bones of the skull are not of 
normal type, but rather that the skewness and kurtosis of such distributions are 
peculiar to the individual measurement. It is possible that this can be accounted 
for by the failure of these measurements on individual cranial bones to be the product 



54 


Normality of Cranial Measurements 


Central 

Value 

Adjusted 
Observed Value 

Curve (6) 

2*45 



•on 


2-55 

2-65 

— 

> 

12}**« 

2’75 


1 

4*08 J 


2-85 

3 


9*48 


2-95 

11 


n-ft7 


3-05 

10 


10*14 


3*15 

7 


0*50 


3*25 

2 


3*37 


3*35 

6 


1*51 


3*45 

5 


*61 


3*55 

3*05 

n 

2 i 

l 3 

SI 0 ' 31 

3*75 



*03 


3*85 

— 


*01 


3*95 

4*05 

1 

3 

•0035 

•0015 

•0450 

4*15 



•0000 


4*25 

2 


• 0000 J 


Totaln 

GO 

49*9950 


of a large number of contributory causes, for in such cases the proof of the normal 
law fails. We thus reach the suggestive conception that the simpler the organ 
measured, the less likely we are to meet with normal distributions. A measurement 
which depends upon growth from a few centres or even from a single centre of ossifica¬ 
tion is little likely to have its /3i—0 and 3. But what is it that fixes the fix and 
in such a case? That is a problem of much interest and well worthy of study. 




THE SAMPLING DISTRIBUTION OF THE THIRD MOMENT 
COEFFICIENT—AN EXPERIMENT. 

By JOSEPH PEPPER, M.A., B.Sc. 


1. The distributions of the mean and variance in samples from a normal 
population are known exactly. This, however, is not the case with the sampling 
distribution of the third moment coefficient 


p t = h(x-xf .(1), 

where x is an individual observation in a sample whose mean is x and size is n. 
Although the equation to this sampling distribution is not yet known, we can, by 
actual sampling experiments, obtain valuable indications as to what form of distri¬ 
bution to expect. 


2. In my paper on “ Studies in the Theory of Sampling,”* I gave the mean and 
standard deviation of the distribution of v a in sampling from any population. Their 
values in the case when the population is normal, with mean at the origin and 
standard deviation a, may be written 


jr r\ if 6(n-l)(n-2) . 
fi = 0, , t M t - - — — a* 


( 2 ). 


The value of the third moment (and all the odd moments) of the distribution is zero, 
while the fourth semi-invariant has been given in the general case of sampling from 
any population by Dr R. A. Fisher f. These values of the moments for a normal 
population are: 

„Jlf 8 =0, ,,M 4 = 108n“* (n - 1) (n - 2) («* + 27n - 70) er 1 * .(3). 

This last value has also been obtained by C. C. Craigs (using semi-invariants) and 
P. R. Rider§. 


3. From these values of the first four moments we derive 


A 

A 


M t * ' 

M t _ 18 (on —12) 
Mi * + (n-l)(n-2) 


(4). 


* Biometrika , VoL xu. Deo. 1929, pp. 285 and 288. 

t “Moments and Product Moments of Sampling Distributions.” Proc. London Math. Soc. f Ser. 2, 
Vol. xxx. Pt. 8 (1928), pp. 199—288. 
t Metron , Vol. vn. 59. 

8 “Moments of Moments.” Proc . National Academy of Science *, Vol. xv. No. 5, pp. 489—484, 
May 1929. 







56 Sampling of Third Moment Coefficient 

Thus, if we wish to attempt to fit the distribution of v%- in samples from a normal 
population by a Pearson curve, using only the values of its first four moments, the 
appropriate one would be Type VII, viz.: 

/ a?\~ m 

y-y°( l + -2) <«)■ 

4. All the odd moments of this curve about its mean, the origin, are zero, while 
its second and fourth moments, obtained by integration, are 

^ _ 3« 4 / £i\ 

2 m- 3’ * (2m — 3) (2m - 5) . 

giving A-0‘ 3 ( 1+ 2»,-«) . (7 >- 

Farther, if N is the area of the curve, we have 

N r (r/i) 

2/o= /- fr;-r. .(8). 

ft Vtt r (m — J) 

Equating the values of in (4) and (7) we get 

2 + 6(5»—12) . (9) - 

TABLE I. 


Group 

Positive 

Negative 

Group 

Positive 

Negative 

0—50 

42 

49 

1000—1050 

4 

3 

50—100 

42 

31 

1050 ll(K) 

5 

2 

100—150 

42 

36 

1100- 1150 

4 

2 

150-200 

26 

21 

1150—1200 

1 

7 

200—250 

26 

30 

1200-1250 

2 

4 

260—300 

28 

15 

1250 1300 

1 

1 

300 -350 

15 

17 

1300—1350 

1 

4 

350—400 

19 

12 

1350- 1400 

1 

4 

400- 450 

15 

15 

1400—1450 

1 

0 

450—500 

il 

2 

1450 -1500 

3 

1 

500-550 

9 

16 

1500—1550 

0 

0 

550—600 

7 

8 

1550—1600 

0 

0 

600-650 

4 

3 

1600 - 1650 

1 

1 

650 - 700 

7 

8 

1650—1700 

1 

1 

700—750 

11 

9 

1700—1760 

2 

1 

750 -800 

7 

4 

1750—1800 

2 

0 

800—850 

7 

15 

1800—1850 

0 

0 

850--900 

5 

3 

1850—1900 

2 

0 

900-950 

4 

2 

1900—1950 

1 

0 

950-1000 

3 

2 

1950—2000 

0 

0 


Outlying Frequencies. 



Group 

Positive 

Group 

Negative 


2150—2200 

1 

2100—2150 

3 


2200—2250 

1 

2250-2300 

1 


3500—3550 

1 

2550-2600 

1 


i 

1 



3500—3550 

1 











Joseph Peppee 


57 


5* Far the sampling experiment, samples of 10 from a normal population with 
mean at the origin and standard deviation 10 units, were already available, obtained 
by the use of Tippett's random sampling numbers. 

From these samples, 700 values of the third moment coefficient v 3 were computed. 
This was done by evaluating in the usual way, for each sample of 10, 

iV = P3=S(a*), 

and vz «■ iV — &vi v% + 2*/i' 8 .(10). 

The resulting frequency distribution of v 3 is shown in Table I. The size of each group 
is taken to be 50 units for convenience, the plus column referring to the frequencies 
of vb occurring as positive values and the minus column to negative values of r s . 

I 

6. For the determination of the constants of the Type VII curve, we have in 
the experiment, n = 10, N — 700 and a = 10, so that from (9) and (2) we obtain 


m = 2*815,789, „ 8 Af 2 = *432 x 10 6 , <^ = 657*3 .(11). 

These values give, from equation (6), 

a a =* 1*136,842 x 10® .(12). 

We can now find y 0 in equation (8), which gives 

log yo « 1-7277501, y 0 ~ *53426 .(13). 

Thus, the curve which is to be fitted is 

(' ) -2*815,789 

y = '53426 ] 1 + (1 T^ 84 2)To*} .^ 4) ' 


This curve, together with the observed frequency distribution of v 3 > are illustrated 
in Figure 1. The class interval in the histogram has been extended to 100 units 
so as to smooth out the irregularities of the observed frequencies in the class intervals 
of 50 units. 


For the purpose of finding the moments of the observed distribution about its 
mean, it is convenient to take the values of v 3 at 25a (a — 1,3,5,...) as +1, 4- 3, +5 
and so on, and similarly for the negative values. The values of the first four moments, 
in terms of the units of the sampled population, are 


Vz Mi = 0 3457 x 25 = 8*64, 
y ,M» = 609*87 x 25* = 617*4, 

-1486*074 x 25 s , 

2793105*4 x 25*. 


657*3 _ 

The standard error of „.Af, is - = 24*844 and the standard error of vCF, will be 

V700 

given approximately by the formula 

<r 

<r,a= 2' 


r /A-I 
iv 'if ’ 


where in our case, <r » 657*3, = 12*5, N = 700, so that the standard error of V,,i9» 

is 4212. 














n 


Thus tile mean and standard deviation of the observed distribution of vs differ 
from their theoretical values by lees than their standard errors. 

Further, the ft and ft of the observed distribution are given by 

The theoretical values of ft and /Si are 0 and 12*5 respectively, so that the observed 
/S| is rather small. 

7. To determine the goodness of fit of the Type VII curve to the distribution 
of 1 / 3 , the observed frequencies »/ in Figure 1 and the corresponding frequencies n a 
of the curve, calculate 1 from the formula 

Jfc-i + My. + Sto) 

are shown in Table II. The frequencies are divided into 30 groups; those corre¬ 
sponding to positive and negative values of v 3 shown in rows ( 1 ) and (3) 
respectively. 

TABLE II. 


Group 


1-2 

2—3 

3-4 

4—5 

5-6 

6-7 

7-8 

8-9 

9-10 

10—11 

11—12 

12-13 

13-14 

Over 14 

(i) < 

84 

08 

54 

34 

26 

16 

_ 

U 

18 

12 

7 

9 

5 

3 

2 

16 

(2)*. 

52*9 

800 

46*0 


33*5 


22*0 


13*5 




4*5 

3*5 

16*2 

(3) »/ 


57 

45 

29 

17 

24 

11 

13 

18 

4 

5 

9 

5 

8 



It will be seen that the frequencies in the group (0—1) differ considerably 
from the corresponding n,, and in fact contribute about one-third of the value of 
This is given by 

= = 90 - 24 , ' 


yielding a value P = 1746 x 10“®. 

Accordingly, the goodness of fit of the Pearson Type VII curve 


y = 3/o 



to the distribution of the third moment coefficient in samples from a normal popu¬ 
lation is unsatisfactory, so that it appears, in this case, that the first four moments 
do not furnish sufficient data for the determination of the best fitting curve. 


8 . For any explanation of the bad fit of the Type VII curve, it would be 
desirable to compare the values of the higher moments of the distribution of the 
third moment with the experimental values. A method for finding the sixth and 







00 


Sampling of Third Moment Coefficient 

eighth moments was given by the work of Dr R. A. Fisher *. The quantity 7 , whose 
2 nd, 4 th and 6 th moments are given in this paper, is equivalent to V/i(n - 1)0 %/(ft - 2 ) 
in samples. Further, Dr Fisher finds that “the moment of the distribution of 7 is 
derivable by multiplying by 

_ (n - l) r _ 

(ft — 1 ) (?i + 1 )... (n + 2r— 3)/c 2 r 

the corresponding moment of the distribution of & 8 f.” 

Thus, from the value of ^( 7 ) it is found that 
m(v 3 ) = 3240ft-®(n- 1) (ft - 2)(ft 4 + 84ft 8 + 2696ft 2 - 16168a + 20020) 
and therefore 

/S 4 (**)« 15(ft 4 + 84ft 3 -f 2695ft 2 - 15168a + 20020 )/(ft- l) 2 (ft - 2) 2 

By extending the work of Dr Fisher for the case of the eighth momeut I found 
that 

/ia ( 7 )=* 136080ft 4 (a - l ) 4 (a 6 -f* I7l ?* 6 +13893ft 4 + 580401ft 8 -5131014ft 2 

+ 14132268a -12932920) 

+ (n - 2 ) 7 (n + 1) (ft + 3)... (a + 21 ). 

This value has been checked by Dr Fisher who also evaluated it by his combina¬ 
torial method. From it, I obtain 

/3e(ft8)=105P 6 /(n-l) 8 (ft-2) 8 , 

where P 6 is the sixth degree polynomial in the above expression for ^( 7 ). 

9 . Table III below gives the theoretical and experimental values of the @ 2 , 
and fa of the distribution of v 3 . 


TABLE III. 



Theory 

Experiment 

Pi 

12-5 

7*51 

Pi 

670*83 

139*04 

Po 

99225 

3702 


Although we are ignorant of the values of the standard errors of $4 and fa, yet 
the divergencies shown in Table III are so large that at first sight it appeared there 
must be some fundamental error in the sampling. However, it must be remembered 
that these higher theoretical betas of the distribution take account of values of i/ a , 
no matter how large, while in the experiment, a limited number of samples, namely 
700, only can be used. The addition of one value of v 3 at a sufficient distance from 
the origin would greatly increase the values of fa or fa. 

* * ,= (n - i*{n ~ 2) ** in 8,m ^ el ' 
t iVoc. Hoy. Soe . Ser. A, Vol. 180, No. A, 812, p. 16. 




Joseph Pepper 


61 


We may further note that fit and ft in the case of the Type VII curve are 
given by 


ft' 


5ft* 


ft' 


7ftft 


6 -ft’ _ 9 - 2ft 

which both take impossible negative values when ft=12*5 as given by the theory*. 


10._It was next decided, as the sample values of v% were available, to evaluate 
the Vft =■ rsiV~l for each sample, and study :the distribution of this quantity, 
analogous to the y of Dr Fisher’s paper. Theoretically, it is distributed symmetrically 
and we further know the first eight moments, that is, the ft, fit and ft of the 
distribution. It is independent of the units of measurement, and the introduction 
of the term has the effect of bringing the outlying values of v% nearer the origin. 

The frequency distribution of Vft is shown in Table IV, the size of the groups 
being taken equal o OT, as most convenient. The mean, standard deviation, fit, 
fit and fit of the distribution are as follows: 

Mean Vft . 0-0289, <t\ - 0-347067, 

=0-6891, fit =3-3370, 
ft =18-89, fit =139-03. 

The theoretical value of fit is given by 

8 - 3(n + l)(« + 3)(w * + 27n — 70) 

* (n-2) (n +5)(n +7)(n +9) 

which in the case of n = 10 gives fit = 3-3204, so that it was again decided to fit a 
Type VII curve to the distribution. If the equation is, as before, 

y " y# ( 1 + o*j ’ 


TABLE IV. 



rH 


9 



© 


00 

05 

9 



ep 


*o 

© 

r- 

CO 

© 

Group 

7 

i 

6 

1 

6 

i 

i 

j 

2 

© 

1 

t 

l 

1 

u 

1 

1 

1 

1 

1 

1-4 

1 

1 

7 

o 


(N 

V 


»Q 

© 

r- 

00 

05 

© 

pH 


CO 


W 

© 


oc 


6 

6 

6 

6 

6 

6 

6 

6 

6 

© 



t-4 

l-H 


rH 

l-H 

pH 

pH 

+ ve 

43 

58 

52 

39 

30 

30 

24 

21 

17 

15 

11 

9 

7 

1 

2 

3 

1 

1 

1 

- ve 

49 

47 

44 

36 

35 

30 

22 

22 

10 

12 

4 

4 

4 

1 

2 

2 

2 

0 

4 


the values of y 0 > and m are given in section 4, in terms of the first four moments. 
We thus obtain the Type VII curve to be fitted as 

/ \ - 11.862318 

>'- 499 ' 2,1 ( 1 + 6« m ) 

* It should be noted that for the Type VII curve which h&B been lifted for fitting the distribution, 
the moments also become eventually impossible. This occurs with 0£=280,0 ,o/( 33-1O0,) for the 
denominator becomes negative, sinoe 0,=3*8204. 




62 


Sampling of Third Moment Coefficient 

This curve, together with the histogram in which the size of the groups ha® been 
extended to 0*2 to smooth frequency irregularities, is illustrated in Figure 2. As 
it is seen, the fit is remarkably good, in great contrast to the case of the v% distribu¬ 
tion. In finding the value of the goodness of fit, 26 groups have been used, the 

DISTRIBUTION fff in 700 SAMPLES *f 10 ftem a NORMAL POPUIAim 



last 7 groups on each side of the origin being clubbed together. The values of the 
areas n 9 of the Type VII curve, corresponding to the groups in Table IV, were 
calculated from the ordinates by the formula 

n * = ^ (^~ 1 + 22y * + 

and are as follows: 


Group 

•06 

•16 

•25 

•35 

*45 

•55 

*65 

•75 

•85 

•95 

1*05 

1*15 

Tail 

% 

49-7 

48*0 

44-9 

40*6 

35*5 

30-1 

24*8 

19*9 

15-5 

11-8 

8*7 

6-3 

14-2 


The mid-points of the groups have been given, and the last group gives the area in 


the tail containing the last seven groups. 


Evaluating in the usual way 2 


n 8 


for the 26 groups, I obtained %*** 12*70, giving the very high value of P = 0‘98. 
This excellent fit is to be expected when one compares the experimental, theoretical 
and Type VII values of the moments shown in Table V. The values for the normal 
curve have been added for comparison. 




Joseph Pepper 
table v. 



Experiment 

Theory 

Type VII 

Normal 

Mean ... 

0*0289 

0 

0 

0 

St. Dev. 

0-5891 

0-6794 

0-5794 

0-5794 

fit ... 

3-3370 

3*3204 

3*3204 

3-0 

fit -. 

18*89 

18-99 

20-5722 

15-0 

fit 

139-03 

148-954 

202-677 

105-0 


The standard error of the mean is , 5794/v700«» ’0219 and that of the standard 
*5794 /3*3204 — 1 

deviation is . ^ J — yqq — = *0167, so that the experimental values differ from 

their actual values by approximately their standard errors. The standard errors of 
/Si, /S 4 and /S* are g*7en approximately by the formulae 

= 4A0. + W - A*), 

•V. - y (As •- 6/3.04 + m - 40ft 
<\ « y(0u- 80.0. + 160**0* - 90ft 

We must therefore make some estimate of the values of /S 8 , /Si 0 and /S M . As a 
guide, we may consider their normal values, which are known, and the Type VII 
values given by 

* 0 _ 33/8i*& 0 _ 6WA» 

P8 4-ft 1 Pl °*(15 — 4^i)(4 — £i) ’ Pl4 (7-£ft)(l8-5&)’ 

These are given in Table VI. 

TABLE VI. 



Normal 

Type VII 


945 

2971 

010 

10395 

1 

63142 

fin 

2027025 

1 

90109172 


The estimates chosen were the mean value 1958 for /S 8 , 15,000 for /S M and the 
normal value of /S 14 . The standard errors are, on substitution of these values, 

<r H - *2143, « 3243, « 31. 

Therefore it appears that experiment, Type VII, or even the normal values are in 
good accordance with theory. 





04 Sampling of Third Moment Coefficient 

11 . The normal curve having the same mean and standard deviation as the 
theory was also fitted. The curve is 

«« *(*?»). 

J -5794 V2 tt 

DISTRIBUTION of in 700 SAMPLES of 10 from a NORMAL POPULATION 



ThiH curve with the histogram is shown in Figure 3. The values of the areas n* 
of the normal curve corresponding to the same 26 groups are given below : 


Group 

•06 

•15 

•25 

•35 

•45 

•55 

•65 

•75 

*85 

•96 

1-05 

1*15 

Tail 

n „ 

47'96 

46*56 

43-87 

i 

40*] 3 

35*63 

30-71 

25*70 

20-87 

16-46 

12*59 

9’36 

6-75 

13*41 


The value of in this case is 13 53 yielding P = 0*97, again an exceedingly high 
value. 


Conclusion. The satisfactory results given by the distribution of s/fa = p B v 2 ~i in 
samples from a normal population indicate that it is better to work with this ratio 
than the third moment coefficient v 2 , whose distribution is so much more irregular. 
As the values of all the betas of the distribution of s/fa tend to the normal values 
as the size of the sample increases, and the normal curve fits practically as well as 
the Type VII curve to the distribution, it would appear desirable in practice to take 
the normal curve as representing the distribution of s/fa in samples, failing the 
discovery of the actual curve. 




FURTHER CONTRIBUTIONS TO THE SAMPLING PROBLEM. 


By N, St. GEORGESCU, D.Stat. Paris, Lecturer in the Bucharest School of 
Statistics, Board of Census, Bucharest. 

1. Many important researches in statistics have lately dealt with the moments 
and product moments of sample distributions. Although, except in the case of the 
mean and the standard deviation, no law of distribution has been found for any 
higher moment or semi-invariant, the general problem has been elucidated in a 
great many points. 

A few years ago C. C. Craig, in a very interesting paper, indicated several 
procedures for calculating the product moments and the semi-invariants of sampling 
moments. At the same time he gave a synthesis of the most important results 
already obtained, by previous authors, on the sampling problem. That is why, for 
bibliographic information, the reader is advised to refer to Craig’s paper*. 

Later, R. A. Fisher gave a practical procedure, based on combinatory analysis, 
for calculating the cumulative moments of the sampling cumulative momentsf. 
Still more recently, the same author found out the first few moments of many 
important measures of the departure from normality^. 

As an illustration of the use of such results I toay quote a paper in which 
E. S. Pearson has calculated two tables of probabilities for the distributions of fa 
and fa of large samplesjj. As one may see in the latter author’s note the difference 
between the values of the moments used in these tables—calculated with approxima¬ 
tion—and the true values can be neglected for most practical purposes!]. 

The sampling problem has been handled in two different ways. The first 
considers the case of a sample of any size, the second provides formulae available 
for large samples only; but both these ways present difficulties. 

Most methods given—up to the present—for dealing with a sample of any size 
do not enable us to reach a general expression for the sampling distribution, the 
procedure of calculation being entirely different in each case. As for the case of 
large samples, although general formulae may be worked out, they are only an 
approximation with respect to the powers of l/N, and if a better value is required 

* 0. 0. Craig. "An Application of Thiele’s Semi-invariants to the Sampling Problem.” Metron, 
Yol. vu. (4), pp. 2-74 (1928). 

t R. A. Fisher. "Moments and Product Moments,of Sampling Distributions.” Proe. Lond. Math. 
Soe., Series 2, Yol. xxx. pp. 199—288 (1928). 

$ B. A. Fisher. " The Moments of the Distribution for Normal Samples of Measures of Departure 
from Normality.” Proe. Boy. Soc. A, Yol. 180, pp. 16-28 (1980). 

| E. S. Pearson. "A Further Development of Tests for Normality.” Biometriha, YoL xxn. p. 289. 

|| E. S. Pearson. “ Note on Tests for Normality.” Biomtrika, Yol. xxn. p. 428. 

Biometrika xxiv 6 



66 


Contributions to the Sampling Problem 

further calculations are necessary. Therefore efforts are being made to find out 
general formulae by one or other of the two methods mentioned above; nevertheless 
for practical purposes the case of large samples is much the more useful. 

In this paper I shall introduce new functions in connection with the distributions 
of random variables and offer a new method—based on these functions—which will 
allow us to obtain either exact formulae for small samples, or approximated results 
for large samples. Applications will be made to the case of a normal population. 

Before commencing to show the results I have obtained, I must not fail to 
acknowledge my gratitude to Professor Karl Pearson, by the aid of whose assistance 
and criticism these results were carried to their present stage. 


Statistical Differentials. 

2. Let us consider a parent population of which the distribution is the following: 
Values # 1 , or+ t x 2i ... x it 

Probabilities pi, P 2 , pa, p<, 

and any sample of N from that population: 

Values xi , x 2y or 2t ... x it 

Relative frequencies tzr*, m 2 , vr B , ... m t . 

Let us design by S '«■* the difference between w i and its mean value 

ZJj ~Pi "f" 

The characteristic function of the distribution of the 8©- t te will bo* 


• a* 

f(u 1, Wa, . ( 1 ); 

therefore in any product moment of the 8*r/s the lowest power of l/N is 1 (k +1 ), 
where k is the order of the product moment considered, and I (a) denotes the greatest 
integer contained by a . 

Now let us consider any function of the sample, i.e. of «r x , ... «■*, the ate 

being constant when the parent population does not change. Let F( *r x , mi, ... m t ) 
be such a function; the same function, but with respect to the parent population, 
will be F(pi, p 2f 


Between these two functions we have the following relation : 

&F(vri f ©■$,... ur t ) ** F («r x , vr 2 , m B> ... vr t ) - F (p X) p 2) p B> ...p t ) 

opi 


4* ... .. 


* See Note I, at the end. 


(2). 





N. St. &BOBOBSOU 


67 


We observe that in order to find out the moments of BF(m *, w*, «r*) with 

a certain degree of approximation, owing to the remark made above concerning the 
moments of Bm i9 we reach the first approximation in l/N by taking the first term 
only of the right-hand side of the relation (2). That approximation would be 
rendered closer by taking more terms in the same relation. 

Let F* (ft, f%, ft, ...//) be a composite function of the sample, the correspond¬ 
ing function of the parent population being F{ft, ft,fz, •••/*■). Supposing that we 
want an approximation of the second order, we shall have 


hence 




where on the right'’.and side we must neglect, of course, the terms of the third and 
fourth order in S«r ( ; thus 




In the same way 


Si F' = 1M' 






+ 1 2 U <S,//Si// + Si ftw ~ WW) d ffj, 
+ l\ tjtfjpj. . 


+12 U (S,//Si//+ KfiWfi + hfi'hfi' - *//«• //-*//*// 
+ gi j t k (S*// Si// Si// + Si// S»// Si // + Si// Si/' s *// 


2 S 1 //S 1 //S 1 //) 


+ 41t, j Vi' hf / W $*/*' 0/0/0/*?)/ 


dMW* 


These formulae are very useful and similar ones—for higher degrees of approxi¬ 
mation—may be easily obtained. They enable us to make a connection between 
mathematical and statistical differentials. We shall make a first application of the 
formula (4) to the sampling semi-invariants. 

Approximate Formulae for the Product Moments of the Sampling Semi-invariants. 


3. Let 


t t* t* tp 

2 i + ** 8 ! + -; + V + - 

, t t* t* t* 

-1 + »ni j- ( ++ — + m v + .(”) 








68 Contributions to the Sampling Problem 

be the characteristic function of the parent population, the same function for the 
sample being 

e 11 21 ? ! *= 1 + wi j j +m» ^-j +...+m ? ^| +... (8). 

(In these two relations the notations are those usually adopted; i.e. the s’s stand for 
Thiele's semi-invariants, and m for the moments about a fixed origin.) 

Let us apply the formula (4) to the relation (8); we shall have 

n ^ iSi ’ li ^ iS * + *“ + ^ i8pt "*’’■** = (n + 21 *••••••) e ~* {i) 

.(9), 

and if we put 

e -'h-*wi-»h + "„ u . Ml ±+M,£ + +.(io), 

the relation (9) will give 

&is p ' = 8m/ 4- ^ 4- ... 4- ^ il/iSm'^ 4-... 4- ^ Mp^Smi 

# ..(A), 

where « S - / — x - - f 1 s* 8* .(B), 

P (n !) p i(r a !) p *... r ‘ ** r » 

with lpi = p. 

It is worth noticing that m' being a linear function of tsr* we have 8m/ = S fc m/. 

4. The relation (A) enables us to calculate the first approximation of the 
product moment of any two semi-invariants. By using that formula we have 

{W WJ - li (“) (*) {8m/8m/} .(11), 

and since (8m/ 8m/} = (m i+ j — m< mj), 

the relation (11) becomes 

{WW] = ^ Q) M a _i M b _jm i+ j - ^ 2i Ma-iin.i'ij Q) Mb-jVij 

.(12). 

Now let us return to the relation (7), which by using (10) may be written 
down as follows: 

(l + j + m t i + ... + +...) (l + J/il + M t ... + M 9 ^ + ...) = 1. 

The comparison of the coefficients of the same powers of t gives 
Mi 4- mi m 0, 

Mi 4- %Mimi 4 - m 2 = 0, 

+ (•f) ^-i w i+ ..* + + ...4-m^ = 0, 

or .(C). 









N. St. Gboroescu 


69 


That relation enables us to write down the following formulae: 

— M a M^ ..(13). 

This is a very convenient formula, especially for practical purposes, the expres¬ 
sion which gives M p being similar to that which gives nip as a function of the 
semi-invariants, except for some of the signs. But for the result we are seeking we 
shall proceed to change its form. For that purpose let us consider the series 

e ~* (0 ~ [«♦<'>]- N 0 (i) + Ni w £] + . (D); 

then owing to the formula (10) and the following relation: 

(ji t ft 

mrm + m i+1 ~ + m* +a ^ + ..., 

we shall have mi =* No {i) t 

mi Mi 4- 

miM% 4* 2Mimi+i *f a * 


or (E); 

hence Np i0) « 0, and if we put 

.(14), 

the relation (13) can be reduced to the simpler form 

{WWl-ytf®* .(F). 


Now, by multiplying the relation (D) by ~ and taking the sum from % * 0 to 

11 

infinity, we obtain 

°o „, fji co oo ... u i 

^TV&W-***?*iT7v 

hence 

QO 00 .. rf 

■(Q). 




That series gives by its coefficients the values of N^K Multiplying the rela¬ 
tion (Q) by 

«-+<»> = 1 + Jl/ 1 ^-|-Jlf,~* + ... + M p ^ + ..., 

we obtain 

.(H), 

a relation which enables us to calculate the first approximation of any second 
order product moment {$!«„' $iV] in terms of the semi-invariants of the parent 
population. 










70 


Contributions to the Sampling Problem 

5. Let us now pass to the mean third order product; we shall have 

iwww - {[£.(“)«.-«'] [i(‘) tf.-/«•»/] [fiQ *-»&»»']} 


Ill 


°j \ M a ~i M h -j ifeT c —* [Sm/ $m/ Sm*}, 


and since 

{8 m/8m/8m/ J =^ 2 (m i+j+k — m t m k + { — m k m M + 2m<m*m*) 

we shall obtain 

{Va'SiV W} - [i i k (“) Q Q 


b C 


- 2, Q M a _ ( m t 2,2* g g 

- I, Q M„,„, X, X, («) (») Jlf,., 

- x„ (') x, i (“) ft jc-, 


and 


According to the relation (14) let us put 

.(E bl *). 


.(En 


By using these relations and the previous ones, (C) and (E), the formula (15) 
becomes 

.(F*-). 

But in order to have an approximation as far as the second power of we 
must calculate the fourth order product moment also, for 
{8m/ 8m/ 8m/ 8 m/} 

= j^ 3 ( m <+*+*+i ~ Sm,m i+)t+l + S%m w ) 


+ : 


AT-2 


which gives 


jyT ( Sm <+i m *+i~ S mim M + 3m, m k m{) t 


{Si®/ 81V 8 i«/ 8 i«/} = 2^8 


i. £,£.(?'*' 

111 




+ "-? 4 *G) ©^-"-^‘’+" 4 *!'© (D- sr -"^r D 

+jf,x,x,(f) Q 






N. St. Gbobgrsou 


71 


+ M c M 4 £< (*) + M<M t i Q M t -> N? 

+ ^ [f < (“) • i Q tf-* i7f + i Q N ?. i (*) Jf M 27? 

+ iQl^ ( _ lk 2^* ) .i i Qif fc _ ^ 27 d u, ]...( 10 ). 

In the same way as for the formulae (F) and (F Wl ) by putting 




the relation (16) becomes 

&i s b &i 8 o'&i*d\ 


as JL Y <0) 4- —_~ r AT (0) jy (0) 4 V (0) V (0) 4 . ?v (0) JV (0) 1 < r F* er l 

If*'* a b b t c t d' s £{Z I '^ a,b^ e t d + ^a t c^b t d + ^ b t el •••\-* ? /• 

Now let us differentiate the relation (Q) i times with respect to u: 

. <i7 >- 


hence, by considering the relation (E bu ), we deduce 


and further 




•(D“). 


00 00 00 ... njfl fy mA 

» 2 , S Y 2 , Nf f- r ~ r l .(G“*). 

0 0 0 p *'pfi yi 11 

If we multiply that relation by we finally obtain 

= x a 2 b 2 C <i, £.(H bta ). 

0 0 0 Ctl 01 Cl 

This relation is to the third order product moment what (H) is to the second, 
i.e. the series on the right-hand side enables us to calculate the first approxima¬ 
tion of ($i*a'£iV $&*•'}" 

The reader can now repeat the above calculation for the case of four semi¬ 
invariants. The result is the following formula: 

00 00 00 °° M yb yC yA 

e 4>(H<Ww)-4>{t)-4>{u)-4>(9)-*{v>) ss X 2* V N' (0) , .. (R** r ) 

0000 *> b ' c ' d a\ b\ cl d! { h 


0000 


which gives the first term on the right-hand side of (I* ter ), the expressions involved 
j\T — 1 

by the coefficient ofbeing given by (H). 

6. Now one may proceed and define by induction the general expression 


aifOa.at,... a n 


, which is given by the coefficients of the series 




XK...IXK\ . 

0 0 00 0,1 «t’ a»! i\ 








72 Contributions to the Sampling Problem 

Still by induction we can write down the following table of formulae: 




i l u 


-- M ° K' - N T + MaM t m k , 


=- if. Jfj,+if, Jf. 

+ If.Jf.if® + 

(“) (*) Q (^) M a . t M^M c . k M a . t m i+j+M+P 

- + s jf.if,ir« -sjf.if,jf.irw 

+ MaMtM'Mim,, 


+<n+» 


7-(t) - (“j ••• 

- »• ...... - s ». + ■*.,*.. 

(!) .(J), 

where S stands for a sum of q terms. 

<«) 

The formulae (G tor ) and (J) enable us to carry out the calculation of the first 
approximation of any product moment of the 8j s„' ’s, provided that we have already 
found out the formula giving the corresponding product moment of the Sink's. 

By the method, which I shall give later (§ 8), we obtain 
{8m/ 8m/ Sm k 8m/ 8m„' j 

= at* M m t+i+h+i+» ~ 8 %%+h+i+ii — Sm <+ ,m* +t+J) 


(5) 


( 10 ) 


+ 2 ® ™i m i m w+9 + 2Sm f % fc m l+p 


+ JV' 3 ~ &w ii7n i 7n k+i+v ~ 2^ niim j+k m t+p 

+ 5S%»^m fc m, +1> — 20?7i<7?i i ??i*m l 7?i J) ”| .(18), 





N. St. Georgescu 


78 


{&»/ &»/ Sm k ' Sot,' Snip Sot,'} 


” Tfrl ^i+j+fc+i+p+e ~ S m { m i+t+t+p+<l —S ot (+/ 

" (6) (M 


(IS) 


- Sm <+/+fc OT l+l , + , + 2Swi < TO / m* +l+>+ , + 2 S to, m, 

(10) (IS) (80) 

+ 2Sm, +J m l!+l m JH< — 6 S m, to, to* ot, + p+ , - 6 S to, to, to*. , m p+ . 

(IS) (80) («) 

+ 24Sto,to,to*to, ot p+s — 120 to, m, to* to, m P m q 
as) J 

+ -5T« 8 m l+i m k+l-H>+Q + S Ml+i+k !»!+»+<, - S TO, TOy OT i+ , +p+4 

" La») (io) as) 


- 3S %%n M - 2SOT 1 OT J+fc m, +p+(I + SSwjTO^to,.,**,, 

OS) (60) (80) 

-f 7 S to, to, to* +i to p+9 - 26 S to, to, to* to, ot p+ , +1 30 to, to, to * to, to„ to, 

(46) (18) 


4- ~TFfm + 3 Sra< ra y mj 

" L(16) (46) (15) 

— ...(19). 


We have now everything which is necessary for proceeding to the calculation 
of the fifth and sixth order product moment of the Sis/'s. We shall make one 
remark only concerning this calculation: one should not trouble too much about 
the sums S, for the expressions involved by the formulae (J), (18) and (19) being 
symmetrical, the easiest way to deal with them is to count the number of terms 
which are analogous, independently of the number of different ways in which these 
may be obtained. 

Let us take for an example the calculation of the coefficient of ^ on the 
right-hand side of (18). We shall have 


S m i + j m k+l+P - S 
( 10 ) ( 10 ) 

- S Nf ib (N^ d _ e - SIf. if®. - M 0 M d M.) 

(10) ’ ’ ’ (8) 

S m ( m i+k m l+t> 

( 18 ) 

-8 -&M a M b M e N ( t - 15M a M h M c M a M. 

S to,to,to*to, +c -*■ — S M a M b M c Nj\ — 10 M a M b M t M d M t ; 

( 10 ) ( 10 ) 

hence the coefficient considered is S 


( 10 ) 


In the same way we obtain 

(W^VWW- mK\, tft . .< F, ’> 


( 10 ) 





74 


Contributions to the Sampling Problem 


and 


{^iSaBiSiBia/SiSa^i s/Si s/} 

<»• K*KU ../+s 

+ —jr-sWA.<*")■ 

Let L^ bc t be the semi-invariant corresponding to the product moment 


We shall have 


{$l8a &iS b '&i8 c f ... 


r 0 ) _ 1 at( 0 ) 
a,b* 

£“V <jf i“V 4 - s 

r«> = _L('Ar ,#) _ o vr(°) jir(»> x 

J 'a,b,c.d,e N iV y a,b,c,d.« ( ®, <*. » iV a * •* 

o. *r* <■<’»... *./ -■*!>?»./ 

*■*?.*?.»?/>.<*» 

(10) (15) 


The Associated Functions of Random Distributions . 

7. The form of the right-hand sides of the formulae (20) led me to try to find 
the general expression of c v 

In order to work out this general problem I shall introduce new functions in 
connection with the distribution of random variables. 


Supposing that we have a set of variables 

2/o, 2/i, 2/2, ••• 2/t. 

The following function, 

a P (wi, «»,.•• «„) = ... t ip \y h y it ... y ip ) 


.( 21 ). 



•( 22 ), 


will be called the associated function with the pth order product moments of the set 
(21). In the same way I define P 9 (ui t v tf -a#, ... u 9 \ the associated function with 
the pth order semi-invariants. 

Since we may easily establish the formulae 

«ui ..i *» w*ui..i — SSmioo..o^on ..i 
+ 2! SSmuoo..o^(ao.,oWooii..i 
— 3 ! SSmio,.,o oWlooio. .0^*00011.. i 
+ 


(23), 








K. St. Georgbsou 


75 


and »»u...i“«u. ..i + 8S«i # o..e%i...i 

+ SS*». ,o s oao..o^ooi....i 
+ SS«io,..o«oio.. o^ooxo.. 0 S 9901 ... 1 
+.(24), 

we shall have the following relations between the a’s and ffa: 

&>(«».«•, • «»)-a,— tt„)-SSa»(« <l( w <t ,... ... m^) 

+ 2! SSa* («<„«,„ ... u ik )a,(u it+l ,... u ik+l ) a p _*_,(w< i+J+1 . ... %) 

“3(«<*«> -. «<»+,)a«(«<* + i +l . - «<»+,+*) 

(M < 4+<+m+1 , • • • M<^) 

+...(25), 


a„(«!,«*, = A,(wi.ut,... w,) + SS/9*(u<,, u v „ ... w <t ) $ rk (w <i+1 ,...«<„) 

't (^ii <.»••• ^<jfc) A ( W <i+1» *• • —1 i ^ ip ) 

+ SS/9j (w<,, Wjj, ... Ui k ) fJi(llf k+1 , ... M( k+ i) ( u <t + ; +1 ) ... M <* + | +m ) 

0, -*-*-*, ( tt <*+*+m+1» * •* w fp) 

+.( 26 ). 

I think that the associated functions will allow us to treat many questions with 
regard to an infinite series of random variables—like the sampling moments or 
semi-invariants—in a more effective manner than the corresponding characteristic 
function does. Moreover, I have already dealt with such functions in the previous 
chapter, and their usefulness causes me to make special mention of them. 

Let g{t) rn y Q + y x 1 + y t ~ + ... + y p ^ + ... 

be the generant function of the y*s, and Fdv the elementary probability of the 
event (y 0 , yi> y %>... J/p, —)• 

We shall have 

ai(*i)=f Fg(t t )dv, 

J(D) 

(h, <*) * [ ity (*i) ? (k) dv, 

Ju» 


<* P (t i, k,...«„)« I Fg(t\)g(t%) ...g(t p )dv, 

J(i» 

.(27); 

hence it follows 

+ (h, «i,...e n ,...)- [ 

•J (2» 

88 1 + ji %i a x(U) + 2~j (£ t , fy) +... 




( 28 ). 










70 


Contribution»to the Sampling Problem 


A similar relation may be established for a multivariate distribution. If 

.. . t u t * tu n* 

g(t, u) = 3 / 0,0 + yi,o j-j -+-y 0 .1 j —1 +y*.« 2 ! + JMFl' D + ¥\ + 

is the generant function of y tti we shall have 

. / tl, <j, ... ... \ f jp e x Ifl(<l ,«,)+»<«* «,)+....«( t n , u„)+ ] d v 

~ V«l, Ut, ... Up, .../ J(D) 

= e ii S ‘ A («*) + 5FI («<(,' i) + • .(28 bta ); 


hence it follows 


and 


■ tp ) 


■■ tp )-SSa k ( 

r U v t iti , 

..Up) * 

W W«, * 

,..u p J *\ 

<U 4l , u i%t 

+ 2ISS a k ( til ” 

■■*<*) at 

-V w +l 

»••• *<*+i 






* M + 88,8*1 


■ Up/ 

W V 2 ,. 

.. Up/ 

M <.> 

+ SS/9 fc ( 

) ^< a > •* 

'■ \ O (hlc VI' 

•"W 


• • • M <l+I i 


/ \w<* +l+1 . •••«<»/ 


These relations—which may be formal only—represent the generalisation of 
the characteristic function, and enable us to find out any associated function. 

Associated Functions of the Distribution of Moments and Product Moments 

about a Fixed Origin . 

8 . I shall first make an application of the formula (28) to the correlated 
distribution of sampling moments about a fixed point *. 

t* 


In that case g (t) = ^ & m i + 2 ] + • • • + ^ 8m p ' + .. 




•(29), 


where e* (t> is the characteristic function of the distribution of x. Thus 

X « JV 

f+oo f +00 /*+oo 1 

= drJ dx t ... f(xi)f(xt)...f(x s )e N ' i 

J -00 J —00 J 00 

538 ••• •**)]^> 


* A method for calculating the semi-invariants of the distribution of one moment only has been 
indicated by 0. 0. Oraig, loc, ciL p. 18. 







N. St. Gborgesou 


77 


where 


*(k, [ + f{x)e* 

J -00 




dx 


~ l+ h.‘T' ti ' i ***** - 

+ ^- r -p 2 u .*<£* lti '~ e* <v 3) C*'V - e* (, i>3> Ce* ( ‘‘> - 
the double brackets standing for a symbolical multiplication in order to have 
■eft, <.) = p [*«**>-*<*»*<*>], 

a*(<i, <*, i g ) = ~ t [e*— e *<<!>+*<<,+V 


- + 2e+ ( *i )+ * cV+^<W] f 


tt,...tp)= b-[e* {t i +< «+ +< p’ — Se* (< » ,+ * (< »H+ +V 

+ Sc* (t i ) + 4 «»>+4<‘»+<4++‘p ) 


+ (_ I)p-i(j3 _ l)e*«i>+* «,)++*«,)] 

.(30); 

hence the / 3 ’s connected with yjr (t lt t», U, ... <», •••) and giving the semi-invariants 
of the Sm p ' ’a will be 

b, (ti,t») = j[<t*< t i+V-g*< hh* «•>], 

b»(<i, <*, «*)« p[e*«i+*.+‘.»-Se*»!>+♦“.+«.> + 2e*< < i>+*<V-^‘V], 

b 4 «i, t», ti, U) = ~ [e*«i+W»4> - Se+«i>+* «•+<,+««> 

- Se*“i+V-*<%+<4> + 2 !Se*“i ,+ *“i ,+ * (< »H> 

_ 3 | g* («!>+♦«,)+♦»,)+♦ «4>], 


bp (h, ti,... t p ) = [e* (, i +( «+ +V — S S e*•+*»>+*« t+1 + +< p > 

+ 2! S S e* < < i+*i+ +«<)+* (*<+i+ +<<+*>++(«i + *+i++‘p) 

+ (_ l)P~l(p - 1) ! «*«!>+*<%>+ +#«p>] ) 

.(31). 

The associated functions with the moments of the Sm p ,, a are easily calculated 
either from (30) or from (31): 

a* (<j,<»)«-^[c* “i+V - e* «i>+* <V], 
caih, t t , f») = p [*♦«»+**> -Se+<M+* «.+*»> + 2e* < *i , +*(‘s*], 







78 


Contributions to the Sampling Problem 


a t (<i, t t , U, U) = -^8 [«* lt i+h'-h+*i> — Se* (< i ,+ * ,{ « +( »+ ( 4* 

+ <*«>+♦ «<*>+♦«*»] 
+ N ~ t } s [e* «!+*.» - e* <*i»++ '*«>] [e* «.+».> - e* «.>+* ««>], 


^p(fly ••• ^jj) *■ ^ a j> (^1> ••• ^p) 

-h-ZV(iVT — 1) Sa*(4, ^ 2 ,... 4) a p-*(£*+i> ^*+i> ••• $p) 

+ i\T(iV — l)(iV'— 2)Sa*(^i, ... tf*) a i(**+i> ••• ^+i) a p-i-*(^+i+i» • •• t») 

+...(32); 


hence we may obtain the product moments of the Bm p n s. (See above, formulae 
(18) and (19).) Their calculations present no difficulties, for the coefficient of 


uy 

il 


uj uf 

JT-*T 


...in is m M+k+ .... 


9. The method which has been developed for finding the product moments 
and the semi-invariants of the distribution of sampling moments about a fixed 
origin is readily extended to the case of sampling product moments about a fixed 
origin of a multivariate distribution. In addition to being very useful, this problem 
is also a further illustration of the method of associated functions. I shall treat it 
for the case of a frequency function of two variates, after which the calculations 
for a frequency function of three or more variates will be straightforward. 

We shall have 


g ( t , u) = j-j Sw'io + j-j Sm'oi + 4* j-\ • J] & m 'u + ^ 


* a u 


t u 2 


hence 

where 


+ gj Sm'ao + *2J. Y\ & m 'to + Jpj • Y\ W 4 3j + •. 

1 % r Xit+y^ Yi 

-yf*L‘ - ,W J : 

f( h ' 1 .. •• a _ r*f '*■—‘••••ai* 

V 0 / ) tj\ \* ^ / t{ } tj, t k \ 


-1 + 


Developing the right-hand side of this relation we obtain the following 
formulae: 

/ = 1 r* (u 1+Wa ) - e* (»,) + 0 («,)] 

\«i, V L J 

a, (% £ ’ S “ iP ^ + £+ u ’) -Be* W + *U’+«!) + 2 / (£) + * («5) + * (£)j, 





N. St. Georgescu 


70 


h, h, 

Mi ,«», 



1 


■ • +t » 
e \“i+«*+-+«>. 


) 




^ +1 i + ... 
%+«, + ... 


+ Be 


■CMS 


+ 0 


/ <8+ t 4 +... + tp 
\U| + U 4 +.,.+Wp 


) 


+ t P \ 
+up) 


+(_ l)p-l (p-1)0* («h ) + * («l ) + * («l ) + ••• + * (t*)j 


hence the o’e and ffa connected with yjr( will be 

\Ml, «i, 

I (**•••• **\ 

* \«x, ..Up/ 

1 r 4 >( tl+ t » + - + l A /«,+ «s+...+ <*\ +0 /'«*«+••• + *»A 

= « \#,+«,+ ...+«p/ _ gsJ \» 1 +« 1 + V«*+ 1 +•••+%>/ 

+ ( *1 + **+...+ **V . / <*+! + ••• + tt+A , **+»+!+ — + { p\ 

+ 2! SSe \“i+ m j+ + u kJ \ M t+i+•••+“»«/ v \Mt+i+i+---+“p/ 


•(30“*); 




\«i / w 


+ ...+0 


and 


(£)] 


.(3in 


9 \U lt U*,...U p / 

= i7apf ^,, **,- M + iV^-lJSSaJ 41 ’ **’ ”• Sa„_* ( ***** - M 

V \u 1 ,u t ,...uj \Mi, M*/ v \U k+ l,...U p J 




h+l> 

■«w. 


/ * VM* +l+1 


■ *1 

W *+Z 


.... «p\ 

, ...M p / 


.(32 w *); 

hence it follows at once: 

{Zm' tuh Sm' itt)t } - jj(m il+ltth+jt - m {uh m itth ), 

i,i,, fg\ ~~ ^4.4^1+^it^+^i 

— +<j,a+fa - m A,fc m fe+<l,4+fc + m i%,h m i»,<»)> 

(3m <4, y, 3tti ^ ^ 4 j = ^ $ ( ?n <i+<i+<i+<4, MfM 


® ^1.^1 /|+^4 

— 8m <1> y 1 tn.y l ,y I m 4to y I tti <4> y 1 ) 

J7-1 

.(33). 










80 


Contributions to the Sampling Problem 

Finally I shall draw the reader's attention to the fact that the formulae 
established in this section present a similarity in their forms. This property may 
have been noticed before, but it is through the method of associated functions 
that we have obtained an explanation of the fact. We observe then, that—owing 
to the similarity of these formulae—knowing one of them, we are certain to obtain 
those which are analogous to it by making the necessary modifications. 

Further Results for the Distribution of the Semi-invariants . 

10 . Having obtained the general formulae for the distribution of sampling 
moments we are able to proceed to find out the general expression of lf^ b c v or, 
which is the same, the corresponding associated function. 

If we put 0 , (<) = j-j Sxsx' + |-j 8x«*' + ... + — Sxs,/ + ..., 

according to the relations (9) and (29) we shall obtain 

.(34). 

Let us denote by A p {1) (ti, t 2 , fa,... t p ) and B P {1) (< tl fa, fa,... t p ) respectively the 
associated function with the pth order product moment and ^?th order semi¬ 
invariant of the S^j/'s. 

Owing to the relation (27) we shall have 

A p *'(tu fa,...fa).(35), 

where a p is the associated function with the jpth order product moment of the 
$m p ' s. 

Now by using the relations (25) we obtain further 

fa,fa) = er+<h >-*«*>- -*<V6,(fa, 1 2 ,... tp) .(36), 

where b p (ti, fa,... t p ) is given by (31). 

The formulae (35) and (36) solve the problem which I proposed working out in 
the second section. The method followed in attaining this aim constitutes an 
example of the usefulness of the associated functions. 

Let us put 

S4 p (ti, fa, ...t p )=*N p Si p {t lt fa,... fa) <*>-•••-4 (fa> 

— Ux+tg4*4 ^Ug>—■ —♦Up) 

_Se* «»+*»+ +y-* <*»>-* «»>- -* <‘p> 

+ Se* (t *+‘»+ +<p)-♦ «,)-*u 4 >- -♦(*p> 

+ (-l)P- l (p-l) . 

The relations (35) and (36) will give the following table of formulae: 

At™ 

-ds (1) (h, h, fe) = -jfi k« <*)> 


( 37 ). 







N. St. Grorgesou 


81 


-‘l* 111 (ti, t%> U> h) = t%> t», U) + —S t%) S4t{h> tt), 

(^li ^1 ^4» ^)“ ^4i ^i) 

JV--1 

+ -»rj- S^(<i. tt)£4*(ta, t t , t s ), 
iy ( 10 ) 

^• (W (h> t%> U, ts, ^5» k) “ ti> t$, U, h t ts) 

jVr_i 

^ W“ [® <$4%(ti, tt) £4i(t%, ts, ts, ts) + S *S4s(ti, ts, t$) &4s(ts, t$, <*)] 

iy (15) (10) 

.| _ .^ S t£$a ((i, tf 8 ) S4% (ts, ^ 4 ) <$4s (ts, ts), 


*^7 (tlf t§, t , tj) — jjj-Q £4 7 (tfi, tf, ... ts, ty) 


N-l 

■+ Ttfo" [Sc5^a(^i, M + h, ts)S4i(ts, ts, ts, t 7 )] 

iV (2D (85) 

+ ---- g t*) t <4%(ts, U)64s(ts, ts, tj ), 

iV 0 ( 108 ) 

-4s(^i, ^a» ••• t 7 , h) — jjp £4's(ti, ts ,... tj, t^) 

+ m— [BMh, ts) SAs(ts, h,... *s)-f S 64s(h, ts, ts) Ms(h, £®> • •• t%) 
w m (so) 

+ S S4s(h, ts, ts, ts) Z4i(ts, ts, h, £ 8 )] 

(35) 

+ { -- } [ g s4s(t\, ts) 64s(ts, 1 4 ) S4i(ts, ts, tj, ts) 

iV (210) 

+ S S4%(ti, ts) '^s(ts, ti, ts) Ms(ts, tq, £ 8 )] 

( 280 ) 


_l_(-- ) g . r ^a (h, ts)S4s(ts, ti)^4s(ts, ts)S4s(ti, U) 

iV (] 05) 


and 


.(38), 


Bs {1) (ti,ts)=±64s(h,ts), 

Bs {1) (ti, ts, W* j^s ts, ts), 

•®4 a> (tit ts, ts, ti) = i a [6di(ti, ts, ts, ti) — S *S4s(ti, ts) '$4%(ts, ti)] , 
iy ( 3 ) 

-Bo* 1 * ( h , ts , ts , U , ts )** itfi [£4 s ( ti > ts , ... £5) — ® <$4 s ( ti , ts ) S 4 s ( ts , U , £5)], 
iY (10) 

Bs (1) (h, ts, ts, ti, ts, ts)=* -TfslS4*(ti» ^ 2 , — S S4s(ti, ts)S4i(ts, U, ts, ts) 

J* (15) 

— Ss4s(ti, t%, ts)£4%(ti, ts, k) + 2! S £4%(ti, t%)S4s(ts, ts)S4s(ts, fe)]» 

(10) (15) 

Biometrika xxiv 6 



82 


Contributions to the Sampling Problem 


B'P* (ii> t%< foi • •• ti) — ~xfi ••• ti) — ® ^i) tt, ft)) 

li (81) 

— S -S4t(ti, t% y t%)t&f ^6> h) -h 21 S S^tipi) S4%(fa> t%> ty)], 

(35) (105) 

•Z?8 (1 * (^1 > ^2, ^3> • • * £g) — ^7 [- 5^8 (^1» ^2, • • • fo) ® (^1 ) ^t) (ifr, it> • • • il) 

•iy (28) 


— S £4&(tx } ig, is)r.^5(^ 4 1 ^5» • •• is)""S it) it» h)S4i(fa> t$> i?» is) 

(50) (S5) 

+ 2! S sA% (ii, it)^8(it) is> ie)t?^8(is) is) 

(280) 

+ 2 ! S f5^a(ii» t>t)S4*(tz } ie, ^ 7 > is) 

(210) 


-3! S ^ a (ix, ia)^ 2 (is, ^^(is, is)^(i 7 , is)] 

(105) 

I shall make one remark only concerning these formulae. 


•(39). 


One might think that there is a difference between the right-hand sides of (38) 
and the series, such as (H ter ), which we have used above for obtaining the product 
moments of the Sis/’s. There is no difference, however, for the series (H* 61 *) involves 
terms which do not correspond to any product moment—such terras being eliminated 
in a natural way from ^(ii, i*, is> is). Consequently it is immaterial which of 
these series we use for our calculations. 


11. In the previous paragraph I have dealt with the correlated distribution of 
the £i3/*s. In order to obtain better approximations or to have a certain measure 
of the difference between the true values and those obtained by my previous 
method, further calculations concerning $ts p \ &*$p> ••• are required. The method 
which I am going to adopt is that which I indicated in the first section of this 
paper, and the point from which I shall start will still be the relation (8): 

Si Yj + s* |j + S.' ^ + ... + s p ' ~ + ... 

= Log (l + mi Yj + W + • • • + m <i ”, + • • •).(40). 

If we put G r (t) = Y-,8 r Sl'+|- ! SrS*'+|- ! Sr*3' + -..+JjSrSp' +.(41), 

owing to the fact that h k m p f = Sm p r the relation (40) gives 

Q r (t) = Q t {t)-\(G x (t)? + ±{Gi(t)f- ... + ( -~- ) — <Gi(«)J» .(42). 

This relation enables us to obtain any desired degree of approximation in the 
mean product moments of the sampling semi-invariants. And what is more, we 
shall not need new calculations, for any function associated with O r (t) will be 
expressed in terms of the functions associated with Oi(t ), functions with which we 
have already dealt and which are all worked out. 






N. St. Georgescu 


83 


have: 


Let •••<*) be tbe jpth order associated function with Q r (t); we shall 

A* (h, U,... t v )- f FG r (h) Grib)... Q r (t p )dv 

p HD) 

7*7* i *> H*i%x ... xi p ..•+*&'£>-<») .( 43 )> 


where 4?V.. +im (ti, t%,... t p ) stands for 

" * it H i. 


aV) 

'“<!+«+ ... +<„ 


(pit ••• t\, ^ 1 » ••• tt, ••• tp, tp, ... tp). 

^ -> ^ ... J : 7 

h it ip 


Before proceeding to calculate the approximated formulae for in terms of 
the characteristic function of the parent population, we observe that in order to 
obtain the exact va’ jes of hs p we must make r equal to infinity in (42). Thus if 

G(t)-i ] S Sl '+| i 8s,' + ...+j8V + - 

we shall have 

0(0 = Log [1 + Gi(0].(44). 

By using the general formula (28), the function giving all the associated 
functions with the product moments of $8 p r will be 


hence 


f Fx 11 + 0 1 (t 1 )Y [1 + 0 1 (t % )f ...[1 + G 1 (t p )Y ...dp 

J(D) 

= !+- %< Ax'iti) +1* 2,., (tM + . 


.(45); 


1 * 


I,S, ( <*,.<,). 


Ap'(t 1 ,t t ,...tp) 


i l i l * *ix H x ... x t„ *+*•+ • 7 ,.® / 


.(46). 


Now owing to the formula (25) the associated functions with the semi- 
invariants are straightforward: 


ft (t f\-y. ? (-I)*'-’ 


- [<*& <*.« - A ? (<t) A T (*•>] > 

B »(tu tt, [<, + *(fc. «.,*.) 

- S^ t (1) (t0 A? +k (t,, tt) + 2 A™ (h) Aj» (U) A? (t a ) 


i * 


6—2 







84 


Contributions to the Sampling Problem 


k< ••• tp) 


00 



? ? ( -pw-V f 4 m 

f<* f<p ij x it x ... x i, i <*+'«+•••+’«> 


(k, 



■tp) 


-ss <U + ... + * 


*1 *'* ** 


(tafl> ^b+8> ••• ^f>) 

**+l **+• % p 


+ 2! SS +ijfc (<j, k, 


h) A 


(i) 

ijt+l+ ... +U+I 


^i*+l+i+ .. *Hp 


U+J+l *- 


+ (- l)"- 1 (p- l)!il®<«») (4) 4j(4)l .(47). 

i'l «t ip J 

These formulae are very convenient especially for approximate calculations. 

Thus, in order to have an approximation as far as j^ y we must take in* the 

development of the right-hand side of (47) only the terms of which the index does 
not exceed four. 

Considering the expression of given above, we shall have 
B% Ci> £ 2 ) — ^y* (ti »^ 2 ) 

+ Wi [&* (^i) (h , £ 2 ) + 1 (S4z (£ 1 , t 2 )) 4- < r Az (<i) S4 2 {t \, £») 
iV 2 22 

—£ 2 ) — i sA*(ti, £ 2 )] 4-..., 

2 2 

^3 Cl, ^ 2 » fa) = ['^8 (^ 1 » ^2 » fo) — ' f A% (^ 1 , ^) S4z(t\y tz) 

— h) k) — ti). r ^ 2 (tz % k)] 4- ...» 

tz, < 8 , £ 4 ) = 2^2 ['^4 Cl> t 2 > ^3> ^ 4 ) 4” ^2(h, h) -SA 2 (fa, fa) 

4- ,^2 (fa, fa) ^2 (fa, fa) + c.^2 (4 , < 4 ) ,^2 (fa, fa)] +.(48). 

If we compare these formulae with (89), it will be seen that in the development 
of Ifa(fa, fa, h) we obtained new terms in the coefficient of Although these 

terms are connected with the first approximation of 5 8 , they could not be obtained 
at that stage. The explanation is obvious: they follow from the fourth order product 
moment 

Substituting in (48) the expressions of S4 V given by ( 37 ) we shall have 
j?i (k, t t ) = 4 [e* «i+V-* «!>-+ <*a> _ 1 ] 

( 2 tfj)—(^>—30 (<j) ( 2 ij)— 2# (4j) 

+ <*i>-**«i) ~ ^0x+<,)-4 «,) 4 . £ 

4 - 





N . St. Geohgescu 


85 


t%, <,) = [>*«H>-*<*.>-*«»> _ e*«,+<«>+* «,+<,>-*«,)-*«,)-♦ (*,> 

_- 0*«i+«.>+*<<«+*«>-* IV -4 «•> 

+ <%+<«>-* «i>-+ <v + g* <M-V-+ ««>-+ ««> + e* **•+<!*-* «,>-* «t> — i] +..., 

£4(^1, <1. t*, t t ) = [Se* (< i +i »'+*«»+V-*<V -4 (V^(V-*nt t ) 

- + 3] +.(49). 

For practical calculations we can neglect in these formulae the terms which do 
not involve all the variates t —such as i n the development of 

B$(hi hi h) —for these terms, as we noticed before, are reduced when the calcula¬ 
tion is worked out. 

f a f b 

To find the coefficient of • yj in the development of fo) we proceed as 
follows. 


Let us consider first the series 

e* “i+V-* <h>^ (V = i„ JV^ b 
Owing to the fact that 
<l> (h +t%) — <f> (ti) — <f> (t 2 ) 


h h (h 2 *2 *1 tf\ (if ^ if if U tf\ 

* 1!* 1! + *\2f] ! + l!'2!j + 4 \3!‘1! + 2!’2! + 1!'3!/ ' 


we shall have 


N°\ 


_ 1 

(h!jx!)*» (H!j »!)*' ••• X ffi-ff*- ■ 


„»< 


8 


fit 


n+ji i*+jt 


the sum being extended to all the values for which 


.(50), 


2ig = a, 2jg = 6, 

the i*s and/s being always different from 0. This formula is similar to that which 
gives the product moments of a distribution of two variates in terms of the semi¬ 
invariants. 


Continuing, 

A 7 «» g_a!_6! c! _ 1 ox g% 

a > b > c (i t Iji ! k y \yi (i 2 lj 2 !k t 1 )°*... gi ! g 2 !... Wl+Aci ’ 

in which, as before, 

2ig « a, Xjg * 6, 2kg = c, 

i ^ 0, j ^ 0, k tft 0. 

ti a tf 

In the same way we obtain for the coefficient °f^j ^ * n 

(tx+<,)+^ ( 2 ^-aM <!)-<*«,) 

a! 6! 1 


.(51), 


the formula 


S_ X _I_ 

(*i!ii!)**(»» 1^* !)**•.. fills'*!... *•* ** 


•(52), 





86 


Contributions to the Sampling Problem 


in which » a, Ijg = ft, i ^ 0, 

and S{ , i = s (+j if y>0) 

.*,-<2*-2)* if j -OJ 

The coefficient of the same term in the development of 


m 


will be given by the same formula (52), but with this difference, that the sum is 
extended to the values for which 


and 


tig = a, Ijg m ft, i + 0, 

s it i -2 t s i+j if 0 
#„# = (2*— 2)s< if j — 3. 


.(54). 


The similarity of the previous formulae enables us to write down the following 
relation: 


•2_ 2 °- , + 2 ft - 1 -2»- l -l\ 

' \N N * 


a! 6! 


_ a! ft! __1 

x gi'-g^ 
l 


\+Jl '' - 


j. >]L g. „ . _ ,7 * v • v . A _ V f/1 offi 

* (»i! ji!) Bl (4! jt !)*»..• <71 ’<y 2 !... "• •» '»• * '" 


i / ■ I r 


a\ b\ 


8°' ... 




iS/rr 


«! ft! 


, sf*. 


2 A 7 2 ' (h !Ji (itljt !)“* • ■ • r/j ! y 2 !... it.it'" 


1 a! ft! 

! /„' - t ~i\«. / • t *' 




2jV2 ^ (h-Ji-)" 1 (i*!y*!?*... x gi’g2- ...• 

In this formula the sums are extended to all the values for which 


.(55). 


% = «> % = ft, 2 g»-g, 

the sum S involving the values of t and j different from 0, whereas in the sum S f 
at least one of the values of i is equal to 0, in which case 


for S it and 
for S/. 


* = s i+i 

s <> * = (2* — 2) 8 j 

s i>i~ 2 ig i+l 


if i±0,j±0, 
if i~0,j*0, 
if i*0,j + 0. 


*<,, = ( 2’-2)sj if iutO.jtO, 


Similar formulae may be easily established for the other developments, but it 
is not necessary to dwell on them now; the reader will not have any difficulty in 
writing them down. 


However, I have calculated a table of the first coefficients of the development 
of Bt (<i, f s )—i.e. of the first second order product moments of the sampling semi¬ 
invariants—which will be found at the end of this paper. 





N. St. Ohorgbsou 


87 


Associated Functions of the Distribution of Sampling Product Moments 

about the Mean. 

12. In this section I shall make an application of the method of associated 
functions to the only problem which has not yet been attacked in the previous 
pages, to the distribution of the sampling product moments calculated about the 
mean of the sample. 

If 

we shall have 

x( 0 = i + a**' |-| + p* + ••• ••• 

- ^1 + m\ j-j + to*' +... + rn g ' + ...) e~ m *' t .(56). 

Let us put 

Vr (0=£ Kti + £ «,/%' + -+£, W +.(57); 

since 


= (- *Y (i + mf l + W |j + - + < £ + •••) + *' (- ly-'tw, 


— (_ A<—1 j? g-m/f 




T^-7 - 0, 


we shall have 


7r (<) - S' (<)[*- fj Sm i' +1-, - • • • + (~I)i gr_1 m »'] 

- e* «» [I, W-|6W+...- ( ~p .(58), 

where we have supposed that the origin is taken at the mean of the parent 
population. 

If r »l we obtain = 8m p — p8m' P „ 18 m/, 

a formula given by Professor Karl Pearson 

Now let us make in (58) r equal to infinity; we shall have 

7 (i t ) * [p (0 + ^ <«>]- e* <«.(59). 

Substituting in this relation the value of g(t) given by (29) we obtain 




.(60). 


Biometrika , Editorials, Yol. n. pp. 273—281 and Vol. ix. pp. 1—10. 








88 Contributions to the Sampling Problem 

Before commencing the calculation of the associated functions of y(t) I shall 
consider the expression 

‘W V J(D) li 1 

+ F‘(' - Jf)(* -1) - (‘ - V) (S ^ +S '‘”**“ ) 

+ 3? i ( l _ y)(' ~f) ••• ( l 

+ y -*! 1 - y )(> - - (' - + 

-f S ^ia+^»4+^5«7 4- S 


where 

and 


T=ti + ti+ ... -f > 


»+(-*)• 

We observe that owing to the relation (28), in calculating the associated 
functions of the distribution of the fx s, we can neglect the second term on the 
right-hand side of (60)—i.o. e* (<) —for that amounts to a “change of origin ” 
Thus 

@p(h, ^a> ••• ^i») = ^j»(^i> ^a> ^p) — SSfljf(<x, <2> ••• £*)^p-*(^*+i» £*+a> ••• £p) 

-f 2! SS0* (^i, fo,... t k ) flj (t k+ i ,... £* +l ) ,.,t p ) 


«- (i - 


(63), 







N. St. Geobgescu 


89 


+ 1 < l "- ,> *(-s) t »( r -s) 

N * 

_ ^1 _ ^ s e N ' 2} *'( " + * ' t ' 1 ^ 3 ) + '* ( h - *$r *} + ‘■ 1} * (" n) * * ( f » - N ) 

_ I s e {N - 1} 0 ( ■ h ~W S ) +0 (*» + ~ tl W S ) + (* -1) *(-£)+*(*- J) 

N 

+ 11 [* (- ^> + *( - &)'-* (- ^)] + < f (*■ - &) + * (*• - S) 


.(64). 


13. In order to deduce some practical results from (62), I shall use—as I did 
before for the distribution of the sampling semi-invariants—the well-known 
formula which gives the moments in terms of the semi-invariants. 

Let us consider the characteristic function of a multivariate distribution 


2 2 2 t a u h v c 


v . v < y > , *“ u p V 


We shall have 


^a,b, c,... 


■Srr- 


a\ b\ c\ 


(41 ji 1 h\...) g i(i2 \j% \ ial gi ! g 2 1... *•*•*■- 

.( 65 ), 

where the sum is extended to all the multipartitions 

tig = a. } 2jg = b, 2,kg = c . 

The developments of the terms involved by the formula (63) will be 

(X - 2)« (- *£*) + 4. (h - ±p) + * _(<• - : * p) 

- !•' F (- sT 1 o-^r+c-^+Jf-q«... % . %. 

- F F (- r ta - - a - if a •-« %.%. 

- f-(-^)" K 1 - ... m 






90 


Contributions to the Sampling Problem 


Now, if we designate by 8 (a, b) the second order semi-invariant corresponding 
to fia and hi, by using the formula (65) we shall obtain 

1\_ a! 6! 1 

x [N -2 + (1 - N)h + (1 - N)h]H [N - 2 + (1 - N)h + (1 - N )fc]* ... 


+ ( 1 )° +6 jya+6+l 


S 


a\b\ 


x a" 1 a gt 
* 8 ii+j, 8 u+jt ’ 


( 4 ! ji -) >l ( 4 ! ji !)'* • • • gi'g*'--- 
x [(1 - N)H+h - (1 - Nfti [(1 -Nf*h ~(l± N)}°* ... x C* Ci. • 


( N) 


I\ 


o+b 


s 


a ! 


(i x !)»j (4 !)»*.. 

b ! 


x- ", [(l-^-(l-W-x<‘C 

yj: yji... 


I shall make an application of this formula to the calculation of 8(2, 4). In 
that case we shall have the following bipartitions : 

1 


1 1 

1 1 


2 • 

2 1 1 

2 

2 2 

4 3 1 

4 

4 2 

6 4 2 

6 


1J^ 

• 2 

4 2 


1 1 

2 



2 • 

2 1 

2 


1 3 

4 2 

4 


8 3 

6 3 

6 


2 1 

2_ 

1 


1 • 

2 

2 

2 

1 2 

4 2_ 

• 

4 

2 2 

6 2 

¥ 

6 


giving s 9 , 


giving « 4 jsf 2 , 


giving V 


giving 5a 8 , 


We observe that not all these bipartitions must be considered for the calculation 
which is to follow. Thus, the last bipartitions of those giving s 4 $a arid sf must be 
neglected, for the expressions following from them vanish. It is easy to see that, 
for these bipartitions, the terms which proceed from the first two sums of (67) are 
reduced by those of the double sum of the same relation. Therefore we must neglect 
at the same time the terms following from this double sum. 





N. St. Gsoeobsou 


91 


The terms corresponding to eaoh bipartition are now calculated in turn. The 
coefficient of «i is 



The coefficient of 5 4 « a is 


^ i {l-j)[N-2 + {l-N)* + l][N-2 + 2(l-N^] 
+ ji(l-^)[N-2 + (l-Nf+l][N-2 + 2(l-N)] 


In the same manner the coefficients of * 8 * and sf are found to be respectively 


hence 


N\ 


*««-»(>--»)’(«■-»+*)• 


N 

6 

N 


N + N*) 


( 1 -y)( 1 -»)'-• + F( 1 -y)’‘*■.«*>■ 


Craig has obtained this formula by a different method *. 

By a closer study of the formulae (61) and (62) one may construct a general 
procedure for obtaining any semi-invariant of the correlated distribution of sampling 
moments. This method is based upon the calculus of multi-partitions, and personally 
I think that this is a natural way of attacking similar problems, the multi-partitions 
being necessarily introduced into our calculations by the form of the formula giving 
the semi-invariants in terms of the moments or vice versa f. 

The rules which enable us to make use of the calculus of multi-partitions are 
the following: 

(1) The calculations will be made easier if we suppose that we deal with 
a sample of N +1. 

(2) We shall eliminate from our calculations all the multi-partitions which 
have not all the rows connected by elements of columns such as 

. 72 * 2 • • • 

•15. 42 • • 

3 • • 6 • • 3 3 

4 2- .45 


* See G. C. Craig, loc. cit. p. 82. 

t A similar procedure, but for the cumulative moments of the sampling cumulative moments, has 
been indicated by E. A. Fisher. The considerations whioh led him to the result are, however, different 
from those contained in this paper. 




92 


Contributions to the Sampling Problem 

We observe that these multi-partitions may be decomposed into several others. 

(3) The coefficient derived from a p row partition will be a fraction. The 
denominator of this fraction will be (— 1 f (N +s being the arithmetical sum 
of the elements of the corresponding partition. 

(4) To obtain the corresponding numerator we shall proceed as follows: 

(a) We shall form all the multi-partitions which may be derived from the 
initial one by adding in every possible way the elements of the columns. Thus the 
multi-partitions derived from the four row partition 

1 • 2 ■ 

111 
• 2 1 1 
112 

will be 2121 12 3 1 2122 

• 2 1 1 111 111 

11-2 11-2 -211 

1 -2- 1-2- 1-2 

1 3 12 223- 11 1 

112 -211 13 13 

2121 1231 2122 

1313 22-3 1 3 12 

> 2 rows, 

2332 3 22 3 23 3 3 2414 

11-2 -211 11-1 1.2 -j 

3 4 3 4 } 1 row. 

( b ) A p row partition will contribute with the first factor 

N(N-l)(N-2)...(N-p + 2), 
which must be considered equal to 1 in the case of p = 1. 

(c) Each column contributes with the factor 

N-p +1 + (- N) a i + (- N) a * + ... + (- Nfr, 
where a\, a*,... a p are the elements of the column considered. 

(d) Finally the numerical coefficient will be given by the well-known 
expression 

a ! b ! c!... 1 

(H ! ji 1 • • • )** (it '■ jt ! —)’* * 9i '■ 9t i"« ••' 

• Although the calculation of any coefficient can be carried out through these 
rules it is preferable to obtain some new results in order to shorten the labour. 

Let us denote by at 1 ai 8 a x * ... ai" 

jj j a a 1 «2 a a a 8 ... «2 n 


, cl n t 1 G-ro* &m 8 ••• dm J 




N. St. Gbobgbsoo 


93 


the numerator of the function of N connected with the multi-partition between 
brackets and with those derived from it. On the other hand let us put 

- n [N - m +1 + (- Nfj + (_ Nyn* +...+(- Nfm *]; 

1 

we shall have 




I 

c 

yi 

e 


a! 1 ai® .. 

ai n 

p 

f, 1 a** . 

.. a* n 

= N(N-1) ... (N-m+ 2) 

aa 1 a 2 2 • • 

a» B 


.‘m 1 «m 2 • 

•• «m’‘_ 


a* 1 a m ® ... 

• « m n 








a x l + a 2 x 

«i 2 +a * 2 

... ai" + «*" 


+ 

N(N- 

- 1 ). 

..(N- 

- m + 

00 

OQ 

W 3 1 


cr 8 n 








^m 1 

«m 2 

a m n 








ui 1 + a * 1 

4-as 1 a 1 *+a** + a a 2 ... 

t*i n + a* n + 03 * 

+ 

1T(N- 

- 1 )., 

:(N~ 

- m -f 

4)S 

a 4 l 


a 4 * 

a 4 n 







«m 



am" 







O 1 1 + a * 1 

«i* + a 2 ® 

... ax n + a* B 








a ® 1 + a 4 J 

O 3 2 + «4 2 

... o 3 " + « 4 " 


+ 

N(N- 

■ 1 ). 

..(N- 

- ra + 

4)S 

as 1 

® 6 * 

... a 6 " 









«m 2 

a w n 



+ . («)• 


Oi 1 

a? . 

.. a x n 

at 1 

«** • 

.. a*" 

Ctm 1 

Om • 

.. C 


It is not at all difficult to show now that, if m = n, we ha\e 



w 

“a 1 .. 

Cfn 1 " 


w 

OX® . 

. a,"' 

p 

ttl 2 

«u 2 

«n 2 

= 7 J 

'2 1 

«a 2 . 

. a 2 n 


J*l n 

a*« .. 


1 

.^n 1 

« „ 2 • 

•• «*"_ 


and that 


raj 1 

«1 2 

... a x n 

0 0.. 

«1 







a* 1 

a a 2 

... G 2 n 

0 0.. 

0 




[W 

a x 2 - 

. ax"~ 

a* 1 

«»* 

... a m " 

0 0.. 

0 



=(iv+ 1 y+*p 


a 2* • 

. ri 2 n 

0 

0 

... 0 

0 0.. 

0 

1 

•a 


.am 1 

«m 2 • 


- 0 

0 

... 0 

0 0.. 

0- 

1 







ft 


(*). 


















94 Contributions to the Sampling Problem, 

since we deduce that we can change the columns into rows and vice versa, the 
function P will not be altered even if n — nx. 

Another important property of the function P is the following: 

Oj 1 af ... oi* Xi‘ 
p | a* 1 a** ... a#" X» 


L^m 1 a m ••• C K_ 

fax 1 ax* ... Ox* 

1 | 

= [iST- w . + i+(_A/. + (-A 7 ) A »+ ...+(-JV)H^ “* 

/m 1 °m* a m*J 
V + "2 1 «1* + "2 2 • • • «!** + a 2 


+ S[(— N) k i — 1][( — iV)*» — 1] P 


03 1 «s* a»" 


w m 


^m W J 


+ S [(— iV ) A 1 — 1] [(- _ 1 ] [(^ iNT )^3 1 ] 


xi J 


4-. 

As we have 


«■} + (Is 1 

Ol 2 + O2 2 + «s a 

.. ai n 4- «* n 4- «8 n " 

tu l 

«4 2 

... a t n 


a m 

C 



.(c).' 

+ #][<- 

- Nyn+' + N).. 

. [(-Npi+' + N) 


= (- 1)^+0,+ *H*n+«A n (i\ r 4 1)" 
x [A°i — A 0 !” 1 4* iV 0 !" 3 — - • • ± 1] 
x [A 0 * - JV®*- 1 4 * iV®*-* — ... 4 1] 


[A®n - 4 - N*n~* - ... ± 1], 

the relation (c) enables us to deduce by complete induction that the polynomial 
function P is divisible by (N 4- l) n+,n “ 1 . Let us put 


ai 2 ., 

- ax"' 


V ax 2 .. 

,. «x" 

a 2 * .. 

.. ag” 

(-1)* 

(iV + 1 )«+*»-!' 

a 2 i aa 2 


a** • 

•• «m"J 


Lam 1 a m * . 

•• a ra ”j 


then 3? will be the numerator of the function connected with the multi-partition 
between brackets after every simplification has been made. The denominator will 
be in that case (i\T4-1)* -11 . 






N. St. OaoEGBsoir 


96 


If we put X—\ 4 + Xt + ... + X», the relation (c) will become 


tti 1 Ol* ... 0| w \i 

at 1 of ... a*" X» 


Oj* 1 a m * ... a m " X Wi 


1 -S 


Xj Xj ... x m 
at 1 a, 1 ... a m J 


,oi" a*" ... a " 


= F+T [jr “»» + 1 + + (- ^> + ...+( - iV)H gi 


Ol 1 Of* ... Ox" 

at 1 at* ... at" 


a*, 1 a m * ... a„ 


+ (3TTi? 8 [( ~ ^ ~ 1] C( “ N)kt ~ 1] 31 


a^ + cij ax* + Ot* ... aj" + 0/1 


«3 X 

a,* .. 



«m* .. 

a m ' 


+ <FTi)» s [( ~ N)kl " 13 - 13 K“ N r> - 1] 

rai 1 +Ot 1 + a s 1 ai* + at* + as* ... ai B + a a " + «a n ' 


a* 1 a 4 * ... a t n 


L V «»* ... a m » J 

Some particular cases of this formula will be of great use for practical purposes. 
The most useful are 


V a i* 
a/ at* 


<ii n a 

at" 0 



, +... + i)5{ [ a ] 


1 [tf*- 1 - +... ± 1] [iRxt (a) -91 (a)] 


31 a, 1 a,* 


a-m* 


a„» 1 “ (—1)* _1 (p — 1) 5R (a) + ( — 1)* S (>) 91u (a)+ ... 

a " 0 


+ (-l)» +1 S { p ) 91m(a)+ ... + 91m ..j>(a).(iii). 

The last formula becomes 
fax 1 ax* ... ax* IT 


91a, 1 a„* ... a," 1 


.a* 1 a,,* ... a m " 0. 






















06 


Contributions to the Sampling Problem 


if 91(a)»9lu(a) = 9lu(a) = --- 5 =9f».,,(a) = 0. 

As an example we shall calculate the coefficient of s* involved in S (2, 4*). 
There are three different multi-partitions to be considered: 

( 2 ) ( 2 ) (») 


• 1 1 • • 

2 • 1 • 

2 - 1 1 

2 

11-2- 

4 1-2 

4 11- 

4 

1-1-2 

4 11- 

4 11 

4 

2 2 2 2 2 

2 2 2 

2 2 2 



The second one must be considered twice for the last rows cannot be inter¬ 
changed. 

The numerical coefficients are respectively 


2t4t4i 2! 4! 4! 2! „ 

= 288, -^- 0 ,* r =288, 


2 ! 2 ! ’ 2 ! 2 
For the numerators we shall have 
1 1 


2! 4! 4! 
3 !” 


= 192. 


91 


91 


91 


1 • 
1 1 


1 


= (91(2))* 91 


• 1 
1 • 
1 1 


= (91 (2)) 8 = iV s , 


1 1 1 
1 1 1 


- m (2) at [ 2 J J] 

\ \! !]-»(*■-*+»); 




hence the coefficient of s 2 5 will be 

192^(4^ a -i^+l) 

( N + l) fi 

We notice that owing to the fact that the function 91 is not altered if the 
columns are changed into rows and vice vei'sa , we shall have the same function of 
N for the coefficient of s 2 s 4 2 involved in S(2 5 ) (excepting perhaps the power of the 
denominator N 4-1). The corresponding numerical coefficient may be deduced 
from the previous one, 

This property is quite general and is very useful for drawing up a complete 
table of formulae. As a 'particular case we notice that the problem of working out 
the semi-invariants of the sampling p!s for a normal parent population is closely 
connected with that of the calculation of S(2 k ) when the sample is drawn from any 
population . 

As a final application of the method worked out in the present section I shall 
consider the general relation 

F(N ) 


S( 2‘, 3*, 4*,...) = 


K (N +1 



N. St. Gborgescu 


97 


where the parent population is normal and 

2a + 3/3 + 4-y + ... so 2p. 


Let us consider also the relation 

S<2‘« », 

and let 

3 

4 


(0 0) 


nri.-p. 


2 2 


be a multi-partition used for the calculation of <S(2 a , 3*, 4*,...). We shall obtain 
the corresponding multi-partitions of 8 (2 a+1 , 3 fl , 4 y ,...) as it is shown in the 
following scheme: 


1 

. 1 

2 

i 

| 

2 

1 

.0 

3 


m.-p. 




4 

~2 

2 2 2. 



We notice that owing to the relation (iv) the function 9? will be the same for 
both these multi-partitions; their numerical coefficients will be in the ratio 1 :2p. 
That fact enables us to write down the following general formula: 


S(2*,W, 4r, •••) = ( 


_ 2 
N + 


\ (P !)• SGI* 4 * 5 * 

l) (» — a — 1 )! ( ’ ’ " 




which is true for samples drawn from a normal population. 
Some particular cases of the formula (v) are the following: 

2"- 1 . (a — 1)! 

'—{mi y- 

3.2“.(a + 1)!JV 2 
(N+lf + * 

3.2* .(a + 2)\N (N — 1) 


S( 2«) 

-S'(2*, 4) 
S(2 a , 3*) 
8 (2% 4 s ) 


, a+2 


(N + 1Y+* 




tt+S 


• 2“+*.(a + 3)! 


N(4,N*-N+l) 
(N + D*+* 1 


.<*+« 


-S' (2 a , 3* 4) - 9.2*. (a + 4)! J ) * a+# - 


(v), 


Biometrika xxiv 


7 











08 


Contributions to the Sampling Problem 


Normal Distributions. 

14. Let us consider a normal population and take its standard deviation as unity; 

the corresponding characteristic function will be e 2 . Under these hypotheses the 
first formula of (49) becomes 


N y 

+ - A* - 1 A*- + 1 ) +. 

We have, on the other hand, 

• J£L ,_!CL 

o v o k ! (p + k )! (p — k )! 


.(69). 


(70), 


eh*+*h <> = £,12P“* 

o P o 4! ’ (p + k)\' (p — 4)1 

where we must distinguish three possibilities: 

( 1 ) a + b = 2 p + 1 . In that case 

4«.» = 0 .(71). 

( 2 ) a + 6 * 2p, a — b = 24, k > 0 . Owing to the relations (70) we obtain 

r p+4! 2*’ - * -1 - 1 

^ a ’ b I fc! • JV a + .( 72 )- 

(3) a = b*.p. In the same manner, as before, we find 


4—•- 


■'a, 6 : 


ilM 


K 1 iv + -). 

For the associated function of the third order semi-invariant we have 
1 


.(73). 


•fij (h.tt, ts) = 2^2 (eh*t+h*n+*th - e h «.+*.> - eV‘ 3 + «,> 

— + e 4 *** + e***! + — l) 4 .. 

where we distinguish the possibilities: y 

( 1 ) a = /J + 7 , b = 7 + a, c = a -f a, #, 7 , being different from 0 , in which case 


•(74), 


/ - {? + #)* (0 + y)l (r+a)! 1 

®*^ ,C * Ct 1 • AT® 4* 


( 2 ) In all the other cases 


Finally we find 


/3! ' N* 


0,6,0 0 ( 1 Y 2 ) 


r a!/ 8 ! 

0 2 T* + 


.(75). 

.(76). 

•(77), 


if the indices a, 6 , c, d are equal two by two, and L a:bi0 , d = 0 in any other 

nn cia ' ' 

* Again, if 6=1, o(ij). 


case. 












N. St. Gbor$escu 


90 


15. More complete results can be obtained for the distribution of the sampling 
moments about the mean. In order to work out the corresponding formulae, I shall 
consider first the development of the expression 


00 cA+cJb 

e*<V+ - 2 S ~r~~ t 1 * l+b U te + b 

o a!6!c! 

since the coefficient of ~ , is found to be 

ml n\ 


m!n!S 


alblcl 


,(78), 

,(79), 


the sum being extended over all the non-negative integers for which m = 2a 4* 6, 
n *■ 2c + b. Therefore the coefficient considered is zero, unless m and n are both 
at the same time even or odd, i.e. |m— n\ ■» 2d. Obviously, owing to the symmetry 
of the expression (78), we can suppose m > n; the sum (79) may be written down 
as follows: 






(’■) 


where p stands for the greatest integer contained in n # . 


.(80), 


Since we have 




U N tt ) +,f> { tl ~ ~N~) + t ) = V (<1 * + ***) ~ ^<> «*> 


-P) + + h -p) = ^ (tf + tf + 24< s ), 


tf-1 


by applying the previous result to the development on the right-hand side of (63), 
we obtain 


0/ . m! w! / l\ d+1 


p -1 

x 2, 


o’ q\(d + q)\(n-2q)V2*° 




for when there is a term following from the last term on the right-hand side of (63), 
this term is reduced by the terms corresponding to q = when such a term does 
not exist, the latter terms cancel out. 


16. The formulae obtained above (§14) show us that the semi-invariants of a 
sample from a normal population are independent, as far as the first power of^y. 

As a final application of the method of associated functions, I propose to show 
now that this is a specific property of the samples from a normal population. 

In order that the sampling semi-invariants be independent, owing to the first 
formula (48), we have the condition 

<t> ($i + t%) — <f> {ti) — <f> (t%) = F(t\ t%) .(82); 

hence <j> (2 1) - <f> (- 2 1) - 2 [<f> (t) 1)\ 

* This resalt gives, at the same time, the expression of the produet moments of a normal distribution 
of two variates, in terms of the corresponding correlation coefficient. 


7—2 







100 Contributions to the Sampling Problem 

This is, a well-known functional equation, and its solution is 

<f> (t) -<#>(- 0 ® *1 tf 
which gives <f> (t) = Sit + ^ (* 2 ). 

Substituting this function in the relation (82), we obtain 

f [(h + t*) 2 ] - * W) - f (tf) = hi 

If we differentiate this relation with respect to t\ and make afterwards t\ » 0, we 
obtain 

2*'(tf)«*"(0); 

^ t t* 

hence ^ (tf) = s 2 - and consequently 0 (tf) = Si y, *4- s 2 

Conclusions. 

In this paper I have introduced new functions connected with the distributions 
of random variables and indicated a new procedure, based upon them, for obtaining 
further results in the sampling problem, although the same method may be applied 
for calculating in an easier way formulae already worked out by previous authors. 

I have shown that having reached the general expression of the associated 
functions we are able to write down any desired formula regarding the distribution 
of sample characteristics. However, a certain limit must be considered, beyond 
which the calculations become too complicated and the labour required to obtain 
some of these formulae seems out of proportion to the practical importance of any 
applications which can be made of them. The general formulae present nevertheless 
an advantage which has not been reached before—at least as far as 1 know—and 
this advantage is double. First they enable us to calculate any formula with any 
degree of approximation, and this, by a methodical procedure. On the other hand, 
they may be very useful for theoretical researches, for they embrace in one entirety 
all the moments—or semi-invariants—of the same order. 

Besides, by using the associated functions, I have elucidated one point regarding 
the sampling semi-invariants showing that generally they are not independent—as 
one might think; the only case in which they are independent is that of very 
large samples from a normal population. 

Note. 

Let us consider n random variables #i, ... x n and suppose that between them 

there is the following relation: 

, X\ + + ... + w n a. 

In that case no real law of distribution can exist for the ensemble of these 
variables unless we consider n - 1 of them only. However, there is a characteristic 
function for the whole system of variables, which has all the properties of a general 
function such as this, except that which allows us to calculate by an integration the 
corresponding law of distribution. 



N. St. Georgesou 


101 


Let F(a>i,xt) ... x n ^i)dwxdx%.,. dx n ~ \ be the distribution of x 1} x ti ...x^ and 
• •• ^ w — 1 ) the characteristic function 

4> (ti y t%> ... t n _i) =* ... 0 n _i) e < i a5 i 4< i®t + *»+*n-i«n-i ctajctej ... d# n _i. 

J(D) 

Now, if we make abstraction for a moment of the relation between the x% we 
have 

f(ui>u%, ••• w n )= i^e w i»i+w> * 1 + ■■■ +«<„*„ dx\dx% ... da? n «i, 

J U» 

and putting a — xj, — # a - ... — # n _i instead of x n , we shall find for the characteristic 
function of x lf x t> ... # n : 

/(%,!/*,... u n ) = e a <f>(u 1 --u nt u 2 -u nt ... u n -.i-u n ) .(1). 

Let us apply this result to the distribution of 8©i, $«r 2 ,... £© n . We know that 
the characteristic function of Svri,Sm 2t ... 8© n _ 1 is* 

* (fe,<b, <»-i) -r*M« [p.+ Sp,**]* 

and since between the S©/s we have the relation 

8©l + StSTg + ... + 8w n = 0, 

by using the formula (1) we find for the characteristic function of S© x , S© 2 , ••• 8© n 
the result given before (§2, (1)), namely: 

/(« 1 , W*.... «») = e~* p < u i [tp i e N 
where n replaces the t of the formula there given. 

APPENDIX I. 

Table of Formulae giving L^\f, the Second Order Semi-invariants of the 
Sampling Semi-invariants. Size of Sample = N. 


j * 



* See G. Darmois, StatUtique mattUmatique (Paris, G. Doin, 1928), pp. 287—289. 
+ L a (a, b) denotes approximate values of L (a, b) as far as terms in 1/AF*. 




102 


Contributions to the Sampling Problem 


/(») . 


4T." 


r(2) 


7(2) 
“^2, 6 * 


( 2 ) 




( 2 ) 


L 


/( 2 ) . 
^3, 4 s 




7 (2) 


(iV A 72 )* 4+a (a’ iy*)***’ 

(y ” A' 2 ) 46 + (.v ~ w*) (®* 2 * 4 '*' ® 43 ^ — A’'- 4,2 * 4 ~ 8 ^' 

* T + (J. ~ 7fi) ( l0 ' , 2»6 + 2O *3*4)-^ 0 s*2*6-^s*S<4-^5*» 2 *S. 

(y- F ! )* 8+ (Jr“ |r a ) ( 12 ' , i«6 + 30 ^,+20V)-^*a«o-^*8<'* 

210 „ 2280 8 3420 2 300 

- jP* 4 Av 2 4244 ^ A 72 **• 

(y — A" 2 } *® -*■ (j\ T - A 2 ) (®*a*4 + 9*8 !! ) + 6 (y — ^ 2 ) ,j3 ’ 

(tF’"" jV 2 ) * 7+ (a 7 ” 2v-) ( 1 2*2*6 + 30* :j s 4 ) + 36 (^y - -^5^ *8* *3 “ y* *2*5 — -y 2 »/ *3 , 

(y - A*) *h + Qy ~ (15*2*o + 45* 3 * f> + 30* 4 2 ) + (y - JJ) (00*2 2 »4+90* 2 * s 2 ) 

60 00 540 900 2 300 

jyS*2*® ^yj*B*fi ^2^2*4 ^ya *2*g yya # 

y - y*j *« + (a - A’ 2 ) (1«*2*7+63* :) *o + 105* 4 * 6 ) 

+ (y - y*) (00 * 2 2 * c + 3(i0*j *3 *4 + 90* 3 2 ) - * 2*7 - ^ * 3 * 0 


210 2520 10800 3240 , 9720 , 

J\/2 * 4 ** r ’ ~ J\ r 2 ,<?2 S '' A 7 2 *2 *3 s 4 ~ ,\72 *3 ~\r2~* 2 ' * 3 * 


iV 7 -’ 


yV 2 


N* 


jW . 
^4, 4 ' 


/ (2) , 
^4,6' 


! (jy ” ^y 2 ^ *« + (jy “ jy 2 ^ (lfi* 2 *o + 48*.,# 6 +34* 4 2 ) + (72* 2 2 * 4 +144# 2 #3 3 ) 

7\ 4 24 192 , 144 . 

4” ”4 ^y 2 J #2 ^y 2 * 2 *fi jy 2 *2 £4“" jy 2 * 2 * 3 » 

( 1 22 \ /I 21 \ 

S' ~ 2V^/ *° \]\r ~~ N 2 ) ( 2() *2*7 + 70*,* 0 +120* 4 * A ) 

4- (jr - (120* 2 2 * fi +600* 2 *,* 4 + 180*,*) 4- 240 Q - ~) sf *3 


840 


2520 


300 


4340 




“ ^2 * 2 *7 ~/ya * 3 *°Tyi “ “/yT *2*3*4 ~ ^ya * :i3 “ ' 2 V r2 * 2 3 * 3 » 

: ^^y ~ ^ 2 ^ ^^y ~ ^ya^ (24^ 2 ^8 + 9 6 * 3*7 + 194* 4 * fl + 120* A 2 ) 

+ - ^ 2 ^ (1 BO* 2 2 * g + 1080*2 *3 *r» 4- 720* 2 H 2 +1260* s 2 * 4 ) 

±( l 31 \^qa 3 xinon 2 ax 222 300 3504 9 

+ V y “ Jiry ( 480 *2 3 *4 + 1080* 2 2* 3 2) - -p « 2 * 8 - -jpj #3*7-^ya~ *2* *6 


11880 


9060 


jy * *8*3*6 ^yr 2 * 2 *4' 


, 12060 , 17640 . 43200 , . 5040 . 

«3 *4 jy t »t»t~ jy t *a *3 Jfi * ■ 


N* 



N. St. Georgesou 


108 


Z M-(y-^)* M+ (y-f») (2 ^' +1 ^ + ^* +125 ^ 

+ Q- (laoo^^is+aoois^+ssoj^^+iooo*,**^ 

+ (y - y) (1800«,W+eoo»,»<«)+ 120 (y - ys) ***~ IT* HH ~ 7 $ * s * 7 


eeoo 

-yj- *2*3*6 


1800 

N• 




3000 , 2400 

~A* **** " J/» 




10800 

’nr*' 


*w 


7200 

’~W 




L { Z\~ (i- yi)*l» + (y- y*) ( 30 «j*e+l 35 * 3 » 8 + 310 * 4 > 7 + 4568 6 *•) 

+ (2100j a * 3 *„+3600«2* t * 6 +300«j 8 »,+3150* S J * s +3160» 4 ***) 

+ (I-(9000»j*88»4+6400»» , 8j+ 1200*j s #() + !800 (y- j^)*, 4 ** 


270 


420 


jya *2*9 2V* * 3 *® 


24600 

iV* 


*2*3*0" 


33600 

N* 


5640 

*0*4*6“ “^r*0 *7 


18000 _ 

■ ~^a“ ** * 6 


„ 151200 2 12420 s 30600 8 50400 4 

.2ji, -- *2 Z *s**“ *3 S *«- A7-S4 *2*6 A/a *2 *3‘ 


A ya -6 jy* 

L { 1 \ - (y - y») *18 + (y- J£) ( 36 *8*io + 1 ® 0 *s* 9 + 466 * 4 *j+ 780 «s* 7 + 461 **’) 

+ (JL _ ®®\ (450*2® *g + 6300*3* »d+ 4950*4*+3600* s * 3 *7 

4 - 7200* 2 *4 *e + 4500*2*6 2 +21600*3*4* 6 ) 

+ ^ (21600* a a 8s*6 + 16300«2 ! »4 , +2400«j 3 8a+8100»3 4 +64000*2* 8 ** 4 ) 

4 7 \ /] 31 \ . 420 480 

+ fy - (21600*2**3*+5400*2* *j)+720 - y- a J * 2 ' - y* *«*">" y* *»*» 


420 

- -fi* *4*8“ 
41600 


10440 


*2**8 


N* 
88200 


45360 
N * 


- *3**0 


8400 , 49880 


0*#W « 

1 *• jy8 * 4 


^2 


*2*3*7 — 


85240 


N 2 

74800 

^0 


540000 2 309600 g . 

*2*6* ~ *8*4*5-Jya~ *2 *3*5 ^yT~ ** * 4 


A* 
75600 

a” 


*2*4*0 


*2 S *0 


i- 4 — 


788400 


A* 


r *2 *3* *4 




jV« 


APPENDIX II. 

Table of Semi-invariants of the Sampling Product Moments about their Mean. 
Tables giving S( 2 *, 3 s , 4») obtained so far. 

N.B. The solidus in the expressions below does not extend to the * powers and products. 
Further for greater brevity of expression the size of the sample is taken to be (N + 1). 

Weight 2 

S(2)-=y/(tf+i)«.*. 

Weight 3 

8{3)-N(N -1)/(N + 

* Formulae marked by an asterisk have oertainly been given before. This statement does not 
of oourse cover R. A Fisher’s Semi-invariants of the sampling cumulative moments. He also 
proceeds in that case to weight 12. 



104 Contributions to the Sampling Problem 

Weight 4 

5(4) = 2V(2V*-2V4l)/(2V41)*« 4 4 32V*/(2V 4 !)*«,**• 

8(2*) « N*/(N 4 l) a « 4 + 22V/(2V 41) V*- 
Weight 6 

5(2,3) = N*(N -1 )/(N 41)‘« 5 + 6 N(N -1 )/(2V + l) 3 

Weight 6 

5(2,4) = 2V*(2V*-2V41 )I(N 4 1)'«, 4 22V(72V* -4 2V 4 4 )/(N 4 1 )****« 

+ 62V(2V - 1 )*/(2V 4 l) 4 * s * 412 2V*/(2V + l) 3 *, 3 *- 

5(3*) - 2V*(2V -1 )*/(N 41) 5 «, + 92V(2V -1 )*/(N 4 1 )‘«,« 4 
4 9 N(N - l)*/(N 4 l) 4 V + 61V (N -1 )/(2V + l) 3 s, 3 * 

5(2 3 ) = N*/(N 4 1) 6 *, 412 2V*/(2V +1 ) 4 *,* 4 4 42V(2V -- 1 )/(N 4l) 4 *,* + 8 N/(N 41 ) 3 *, 3 *. 

Weight 7 

5(3,4) = 2V*(2V- 1 )(N *- 2V4 1)/(2V 4 l) s «, 4 62V(2V - 1)(32V* - 2 N 4 2)/(2V 4 1 )'*,«, 

+ 62V (2V - 1)(52V*- 82V 4 5)/(2V + l) s s s s 4 + 362V (2V - 1 )(22V - 1 )/(N 4 l)*s t *s 3 *. 

5(2*, 3) = 2V*(AT — l)/(N -I l) e a, 4 162V*(2V - 1)/(2V + l) 6 « s « t 

4l22V(2V-l)(22V-l)/(2V | 1) 6 s 3 « 4 h 48A’(A - 1)/(2V4l) 4 s,V. 

Weight 8 

5(4*) = 2V*(2V*-2V t-l)*/(2V 4 l) 7 * 8 442V(2V*-2V4l)(72V*-42V44)/(2V4l) , « J «, 

+ 48 N(N - 1 ) 3 (2V* - 2V 4 1)1 (N 4 1 )«*,*, 4 22V (172V 4 - 432V 3 4 782V* - 52 N 4 n)/(N 4 1)V 
4 122V (172V 3 -202V* 4 262V-6)/(2V I l) 6 « s *s 4 -( 722V (2V - 1)* (32V •• 2)/(N 4 l) 3 *^* 

+ 242V (42V* - N +1 )/(N 4 1 )*«/*• 

5(2*,4) = 2V 3 (2V* -2V4 1 )/(N 4 1) 7 * 8 4 22V*(132V*- 102V 4 10)/(2V + l)%s, 

-I 82V (A- 1 )(5A r * - 5 N 4 2)/(N 4 1 ) 6 * 3 « 6 + 2A 7 (172V 3 - 202V* 4 26 N - 6)/(2V 4 1)V 
4l62V(112V*~52V4 5)/(2V4l)*« 2 *a 4 I-242V(2V - 1)(6A - 5)/(2V4l) 6 W i 12N*/(N+!)*«„**. 
8(2,3*) = N*(N — 1)*/(A7-| 1) 7 *„ + 21N*(N - 1)*/(2V | 1 )•#,«,-4 62V(A 7 - 1)*(82V - 3)/(2V 4- !)*«»*« 
4-92V(2V- l)*(32V-2)/(2V4- 1)** 4 * + 182V(2V- 1)(62V • 5)/(2V4 1) 6 « 2 ** 4 
4-182V(2V-l)(92V-ll)/(2V f 1 ) 6 «,*„* + 362V (N - 1)/(N ) l) 4 a 2 4 . 

5 (2*) — N*/(N 4 l) 7 « g 4 242V 3 /(2V 4- l) e s a s 8 4 32N*(N - ] )/(2V 4 l) fl « s « r , 

4- 82V (42V* - N 41 )/(2V 4 1)« « 4 * 4 1442V 3 /(2V 4 1 ) 6 «,*« 4 + 962V (2V - 1 )/(2V 4 1 )* .* 2 * 3 * 

4-482V/(2V 4- l) 4 « t 4 *. 

ll'eigA/ 9 

5(2,3,4)= N 3 (N — l)(N* — N + 1 )/(2V 4 1)“ «„ 4 22V* (2V - 1 ) (162V* - 132V 4 13 )/( 2 V 4 -1) 7 
4 62V(2V-1)(122V 3 -212V* (• 162V-4)/(2V 4 1)>* ><6 
4-22V(2V - 1) (582V 3 —91 AT* 4 1032V - 30)/(2V 4 l) 7 * 4 s 5 
4 122V (2V~ 1)(162V*- 192V4 16)/(2V4 1) 4 V«5 

4 122 V (2V - 1) (642V* - 942V 4 67)/(2V 4 1 )«*„*, * 4 4 362V (2V - 1 )* (52V - 7)/(2V 4 1 )*«,* 

4 3602V (2V- 1)(22V- 1)/(2V4 1)**,\- 

5 (3 3 ) = 2V 3 (2V - 1)*/(2V 4 1)% 4 272V* (2V - 1 ) 3 /(2V 4 1 ) 7 s,s, 4 272V (2V - 1 )* (32V - 1 )/(2V 4 1 ) 7 « 3 *, 

4 272V(2V - l)*(42V-3)/(2V4 l) 7 « 4 « s 4 542V(2V-l)*(42V-3)/(2V 4 1)%**, 

41622V (N - 1)(52V- 7)/(2V 4 1 )*«»«,««4 362V(2V - 1 )(72V* - 162V |- 1 1)/(2V 4 l) 4 *, 3 
4 1082V(2V - 1) (62V — 7 )/(2V 4 1 ) S «,V- 

5 (2», 3) - 2V 4 (2V - 1)/(2V 4 1 ) 8 «„ 4 302V 3 (N - 1 )/(2V 4 1 ) 7 s, 4 22V* (2V - ]) (312V - 22)/(2V 4 l) 7 « s «. 
4 122V(2V- l)(92V*-52V f2)/(2V4 1 ) 7 « 4 * s 4 2402V*(2V - 1)/(2V 4 1) 4 «,**6 
4 3602V(2V— 1)(22V- 1)/(2V 4 l)*«,* a « 4 4 242V(2V- 1)(52V - 7)/(2V 4 1) 4 «, 3 
4 -4802V (2V - 1)/(2V 4 1)*«,V 


* See footnote on p. 103. 



N. St. Georg-escu 


105 


Weight 10 

8(2, 4*) -2V*(2V* - 2V 4 1)*/(2V +1)»« 10 442V 3 (2V 3 -2V4 1)(11 N* -8 2V 4 8 )/(2V 4 
4 (2V - 1)(2V» - 2V 41) (7 2V* -7 2V 4 4 )/(N 41)*«„®, 

4 42V (522V* - 1342V* + 2492V 3 - 2122V 3 4 1212V - 24)/(2V 4 l)*« 4 s, 

+ 42V(2V-1) (292V 4 - 582V* 4 842V* - 642V 4 17)/(2V 41)*® ( * 

4 42V(1492V 4 - 2142V* + 2882V* - 1302V 4 5 6)/(2V 4 l) 7 s 3 *s e 
496# (2V - 1)(212V* - 372V 3 + 342V - 16 )/(2V 4 1) 7 « 3 « 3 « 6 
4 722V (212V 4 - 442V* + 762V* - 482V 4 16)/(2V + 1 ) 7 s,s 4 3 
+ 722V(2V- 1) (222V*-582V 3 + 672V-29)/(2V + 1) 7 *,*« 4 
4 482V (592V* - 622V* + 802V - 18)/(2V + 1 )««,*«« 

+ 1442V (2V - 1)(292V* - 452V + 19)/(2V + 1)«® 3 *® 3 3 + 1922V (42V* - 2V 4 1)/(2V 4l) 3 #,'. 

S(3 3 ,4) = 2V*(2V - 1) 3 (2V» - 2V + 1)/(2V +1) 3 « 10 4 32V 3 (2V - 1) 3 (132V 3 - 112V + 11 )/(2V + 1 )"«,*„ 

+ 62V(2V- 1) 3 (192V* - 312V 3 + 252V - 6)/(2V + 1 )*«,«, 

4 32V(2V - 1) 3 (652V* - 1192V* + 1252V - 42)/(2V + 1)*« 4 «, 

-(■ 62V (2V - 1) 3 (202V* - 342V 3 + 432V -- 15)/(2V + l)*s 5 * 

+ 182V (2V - 1) (262V* - 442V 3 + 332V - 14)/(2V + l) 7 ®, 3 #, 

+ 362V(2V - 1)(562V* - 1332V 3 + 1312V -58)/(2V 4 l) 7 «„«,«„ 

4 542V (2V - 1)(222V* - 582V* + 672V - 29 )/(2V + 1) 7 ®,« 4 * 

4 542V (2V - 1)(332V* - 1052V* + 1252V - 57)/(2V H i) 7 « 3 *s 4 
4 362V(2V- 1)(642V*-912V4 48)/(2V4 1)*« 3 *® 4 

4 1082V(IV - 1)(382V*- 792V + 43)/(2V + 1 )*»»*«„* + 2162V(2V - 1)(22V - 1)/(2V + l) 3 ®, 3 . 

8 (2*, 4) = 2V 4 (2V* - 2V 41 )/(2V + 1)»« 10 + 62V* (72V* - 62V + 8)/(2V 41 )*«,«„ 

4- 42V* (2V - 1) (232V* - 232V + 14)/(2V 4 1 )* s 3 s 7 

■) 22V(1032V 4 - 1462V* 4 1992V* - 682V I- 16)/(2V 4-1 )%#„ 

4 122V(2V- 1)(92V* -92V 3 4-82V-2)/(2V + 1)"* 3 * + 602V*(92V*-62V4-6)/(2V4-l) 7 ®,*®, 

4 962V(iV - 1 )(172V* - 162V 4- 6)/(2V 4 l) 7 * s « a « s 4 242V(592V* - 622V* 4 802V - 18)/(2V 4 l) 7 s 3 « 4 * 
+ 242V(2V - 1)(642V*-912V 4 46)/(2V 4-1 ) 7 ® 3 *® 4 4 4802V(52V* - 22V42)/(2V 4-1)*«,*« 4 
4-7202V(2V - 1 )(42V- 3)/(2V 4- 1) 6 ® 2 V4- 5762V*/(2V4-1)% 6 . 

<S (2*. 3*) = 2V 4 (2V - 1 )*/(2V 4-1 )*®,„4 372V*(2V - 1 )*/(2V 4-1 )*s 3 ® 8 

-462V*(2V - 1)*(172V- 10)/(2V4 1) 9 « 3 «,4 32V(2V- l)*(612V*-442V4l2)/(2V4l) 9 « 4 «» 

4- 22V(2V - 1)*(592V* - 362V + 18)/(2V 4- 1) 9 « 3 * 4- 62V*(2V - 1)(672V - 64)/(2V 4- I) 7 ®, 3 ®, 

4 242V(2V - 1)(712V* - 1042V 4- 27)/(2V 4- l) 7 s 3 * 3 ® 6 
4- 362V (2V - 1) (292V* - 452V 4-19 )/(2V 4 1 ) 7 « 3 « 4 * 

4 362V (2V - 1) (382V* - 792V 4 43)/(2V 4 1 ) 7 ® 3 ** 4 4 3602V (2V - 1) (42V - 3)/(2V 4 1 )*«,*«« 

4 2162V(2V-1)(132V-17)/(2V t 1) 8 ® 3 *« 3 *4 2882V(2V - 1)/(2V 4 l) 3 *. 3 . 

g(2«) _ jys/(2V 4 1)*« 10 4 402V 4 /(2V 4 l) 8 « 3 s„ 4- 802V*(2V - 1 )/(2V 4 1)*® 3 ®, 

4402V*(52V*-22V 42)/(2V4 1 )*««*„4 162V(2V- 1)(62V* 4 1 )/<2V4 1)V 
4 1802V*/(2V 4 l) 7 « 3 *«, + 12802V*(2V - 1 )/(2V 4 1) 7 ® 3 « 3 «, + 3202V(42V 3 - 2V 4 l)/(A r 4 1) 7 « 3 « 4 3 
44802V(2V-1)(22V-1)/(2V H) 7 ® 3 3 « 4 t 19202V*/(2V4 1) 8 ® 3 *® 4 
4 19202V (2V - 1)/(2V 4 l) 8 s 3 *® 3 * + 3842V/(2V 4 1) 3 « 3 3 . 

Weight 11 

S (3,4 3 ) = 2V* (2V - 1) (2V* - 2V 4 1 ) 3 /(2V 4 1) 1# « u 

442V*(2V - 1)(2V 3 -2V 4 1)(132V* - 102V 4 10)/(2V 4 l) 3 ®,#, 

4 122V(2V - 1)(2V* - 2V 4 1)(132V* - 242V* 4 172V - 4)/(2V 4 1 )»«,«„ 

4 42V (2V - 1) (802V* - 226 N* 4 4082V* - 3672V 3 4 2482V - 48)/(2V 4 1 )*« 4 ®, 

4 42V(2V - 1)(1122V* - 3202V 4 4 5792V* - 6082V 3 4 3732V - 87)/(2V 4 1) 3 « 6 ®, 

4 122V(2V- 1)(732V 4 - 1142V*4 1522V 3 - 762V 4 32)/(2V 4 l) 9 ®,*®, 

4 242V(2V - 1 )( 1692 V 4 - 4342V* 4 5732V* - 410# 4 154)/(2V 4 1)* «,«>«, 

4 242V(2V -1) (2682V 4 - 6522V* 4 10792V* - 8362V 4 304)/(2V 4 1)* #*« 4 ®, 



106 Contributions to the Sampling Problem 

Weight 11 (contd.) 

4 722V(2V - 1)* (66.2V* - 1462V* + 1672V - 97 )/(2V + 1) 8 «A 
4 242V (2V -1) (2162V 4 - 7602V* 4 12902V* - 11622V 4 431)/(2V 41 )*«„«<* 

+ 482V(2V- 1)(1162V* - 1552V*4 1822V-72)/(2V + 1)’«A 
4 1442V (2V - 1)(1902V* - 4292V* 4 4962V - 200)/(2V + 1) 7 *,**»®4 
+ 1442V (2V - 1)(322V* - 2682V* 4 3042V - 126)/(2V + l) 7 « a « s * 

+ 2882V (2V - 1) (412V* - 502V 4 23 )/(2V + l)*s a V 

<8 , (2*,3,4) = 2V 4 (2V - 1)(2V* - 2V 4 1)/(2V + l) 10 « n + 442V*(2V - 1)(2V* - 2V + 1)/(2V + 1 )•«,«, 

+ 22V»(2V - 1) (712V 3 - 1302V* + 1092V - 38)/(2V -|-1 )•«,«„ 

4 22V (2V - 1) (1672V 4 - 2622V 3 + 3262V* - 1262V 4 24 )/(2V + 1 )• « 4 «, 

4 42V(2V - 1)(1532V 4 - 2892V* |- 2772V* - 1242V + 30)/(2V +1)*« 5 «, 

+ 42V*(2V - 1)(2002V* - 1492V 4 140)/(2V 4 l)*s a **j 
4 482V(2V - 1)(722V* - 1192V* 4 892V - 21 )/(2V + 1 )*«„*,«, 

4 82V(2V - 1)(6922V* - 10652V* 4 11182V - 336)/(2V + l)*« a « 4 « a 
4 242V(2V - 1) (1732V* - 3292V* + 2722V - 100)/(2V + 1)» 

4 242V (N - 1) (1902V 3 - 4292V* 4 4952V - 200)/(2V + 1 )* # s * 4 * 

4 1442V (2V - 1)(332V* - 202V 4 14)/(2V + l) 7 * a \ 

4 482V (2V — 1) (3612V* - 3942V + 223)/(2V + 1 ) 7 w a *« 3 * 4 

4 2882V (2V- 1)(292V*-682V+ 40)/(iV t-l) 7 « a « 3 * + 43202V(2V-l)(22V-l)/(2V + l) , « a 4 «»- 

S(2,3*) = 2V 4 (2V-1)*/(2V H l) l0 « u 4 452V*(2V-l) 3 /(2V + l)»« a s, )-92V*(2V-1)*( 172V-9)/(2V +1)* 
4 272V(2V - 1) 3 (11JV* - 92V 4 2)/(2V (- l) 9 .s 4 «, 4- 92V(2V - 1)*(492V* - 362V + 12)/(2V 4 l)'* s *, 
4- 542V*(JV - 1)*(122V - 11)/(2V 4- l)*s a **7 + 642V(2V - 1)*(632V* - 622V 4- 21 )/(2V 4- l)** s «,«, 

4-542V( 2 V — l)*(632V*-942V4 21)/(xV4- l)*# a *4*5 

4 642V (2V - 1) (462V* - 1372V* l- 1462V - 61 )/(2V f 1 )* s t \ 

4 542V(2V - 1 )( 822 V* - 2682V* + 3042V - 126 )/( 2 V 4 l) 8 4 8 s 4 “ 

4- 1082V(2V - 1)(302V* - 532V 4- 21)/(2V 4- l) 7 « a 3 »* 

4- 6482V(2V — 1)(292V* - 682V 4- 40)/(2V f 1)’* a « a « 4 

4- 6482V(2V - 1)(132V* - 362V + 27)/(2V 4- l) 7 « a * a * 4- 12962V (2V - 1)(62V - 7)/(2V 4-1)*s, 4 *,. 


Weight 12 

ft (4*) _ 2V* (2V* - 2V 4- 1)*/(2V 4-1 ) 11 « la 4- 62V*(2V* — 2V + 1)*(112V* - 82V 4- 8 )/(2V 4-1 ) 10 « a #, 0 
4- 162V (2V - 1) (2V* - 2V 4-1 )* (132V* - 132V 4- 4)/(2V 4-1 ) 10 * s « a 
4- 122V (2V* - 2V 4-1) (412V* - 1152V 4 4- 2222V* - 1962V* + 1132V - 24)/(2V 4-1 ) 10 * 4 * 8 
4- 482V (2V - 1)(2V* - 2V 4 1) (162V 4 - 322V* 4- 642V* - 472V 4- 13)/(2V 4 1) 10 « 5 « 7 
+ 62V(772V 7 - 2942V* 4-7172V* - 10882V 4 4 12332V* - 8672V* 4 3742V - 66)/(2V 4 l) 10 *,* 
4 362V (2V* - 2V 4 1)(412V 4 - 602V 3 4 822V* - 382V 4 16 )/(2V 4 1)V« 8 
4 1442V (2V - 1) (2V* - 2V 4 1) (512V* - 892V* 4 862V - 44)/(2V 4 1)' 

4 242V (5742V* - 19682V* 4 45152V 4 - 57882V* 4 52232V* - 26222V 4 640)/(2V 4 1 ) 3 « a « 4 «, 
4 722V (2V - 1)( 1092V* - 2982V 4 4 5802V*-6922V* 4 4492V - 140)/(2V 4 1 )*«,#** 

4 722V(2V - 1)(1152V* - 4442V 4 4 8362V* - 9232V* 4 6302V - 202)/(2V H l)»« s *«, 

4 2882V(A’ - 1)(882V* - 3362V 4 4 7162V* -8872V* 4 6212V - 196)/(2V4 1)** 8 « 4 * 6 
4 82V(7092V* - 31532V* 4 87592V 4 - 134682V* 4 132962V* - 71762V 4 1735)/(2V4l)*« 4 * 
4.722V(1592V* - 3782V 4 4 6722V* - 5942V* 4 3962V - 96)/(2V 4 1 )*«,*«« 

48642V(2V- 1)(852V 4 - 2132V* 4 3042V*- 2582V 4'88)/(2V 4 l)** a **>®t 
4 722V (7092V* - 21322V 4 4 46142V* - 48022V* 4 30822V - 762)/(2V 4 1 )*#,**«* 

4 4322V (2V - 1 ) (2762V 4 - 9682V* 4 16272V* - 13872V 4 482)/(2V 4 
4 722V (2V - 1)(1732V 4 - 8112V* 4 14912V* - 12732V 4 452)/(2V 4 1)* V 
4 8642V (572V 4 - 1012V* 4 1662V* - 942V 4 29 )/(2V 4 1 ) 7 
48642V (2V- 1) (1282V*-2922V* 4 2692V- 106)/(2V4 l) 7 « a V 
4 8642V (112V* - 82V* 4 102V - 2 )/(2V 4 1 )*«,*• 



N. St. Georgkscu 


107 


S(V)-N*(N-WHN + l) u « u + 54#*(#-1 )«/(# + 

+108#*(# - 1)*(2# -1 )/(# +1 )“«,«• + 27#(# -1)*(17# 4 - 16# + S)/(# +1)“* 4 * 8 
+ 108#(# -1)*(7 #* -6 N + 3 )/(N +1)“«,«, + 27# (# - 1)«(17 # 4 -13 N + 9 )/(# +1 )“«,* 
+ 27#*(# - 1)»(37 N - 33)/(# +1 )•«.»«. + 324#(# - 1)»(19#* - 29 N + 6 )/(# + l)»«.« t « 7 
+162#(#- 1)*(66#* -118 N + 46)/(# +1 
+ 108#(# - 1)* (69#* - 102# + 63 )/(# +1 )•«,*„* 

+ 108# (# - 1)*(82#* - 236#» + 242# - 81)/(# + 1 )»«,**« 

+ 324#(# - 1)* (76 #» - 248#* + 295# - 138)/(# +1)»«»« 4 «* 

+ 27# (# - 1)(173# 4 - 811#* + 1491#* - 1273# + 462)/(# +1)»« 4 » 

+ 108# (# - 1)»(71#* - 121# + 42)/(# + 

+ 648#(# - 1)*(79#* - 186# +114)/(# + 1)» *,*«>** 

+ 486# (# - l) 1 (63#* - 164# + 125)/(# + 1 )*«.*«/ 

+ 972# (# - 1)(99#* - 391#* + 633# - 257)/(# + 1)» «,«,*»« 

+ 162# (# - 1)(87#> - 333#* + 493# - 263)/(# +1)» #/ 

+ 972# (# - 1) (23#* - 57# + 38)/(# + 1)’ «/«« 

+ 648# (# - 1)(103#" - 304# + 233)/(# + 1) 7 *,V + 648# (# -1)(5# - 7 )/(# + 1 )•«,*• 

S( 2*) - #•/(# + l)“* la + 60#*/(# + l) 10 *,*!«+ 160# 4 (#— 1 )/(# + 1)“«,«, 

+ 240#* (2#* - # + 1 )/(# + 1) 10 « 4 * 8 + 96#*(# - 1)(7# 2 + 2 )/(# + 1) 10 «,«, 

+ 4#(113# 4 - 68#’ + 68#* - 8# + 8 )/(# + l) w « 8 * + 1200# 4 /(# + 1 )»****» 

+ 4800#*(# — l)/(# + l)*s 8 « 8 a 7 + 2400#*(7#“ —2# + 2)/(# + l)*s 8 s 4 s 8 
+ 960# (# - 1)(6#» + l)/(# + 1)’W + 160#* (# - 1)(31# - 22 )/(# + l)*s t \ 

+ 1920# (# - 1) (9#* - 5# + 2 )/(# + 1 )•«,«**„ + 480# (11#* - 8#* + 10# - 2 )/(# +1)» « 4 » 
+ 9600#*/(# + l)*« s % + 38400#*(# - l)/(# + 1 )««,*«, « 6 
+ 9600# (4#* - # + 1 )/(# + 1 )* «,V + 2880# (# - 1) (2# -1 )/(# + 1 ) 4 *,*„**« 

+ 960# (# - 1) (6# - 7 )/(# + 1 )* a, 4 + 40320#*/(# + l) 7 a, 4 a 4 
+ 38400#(# - l)/(# + 1) 7 «,V + 3840#/(# +l) 8 a, 6 . 

Normal Population 

S( 4 4 ) - 6012# (272iV* - 370#* + 504 N* - 304 N + 8 0)/(# + 1 ) 8 V** 

8( 3®) - 3265020 N(N - 1)(4# 8 - 13# + 11 )/(# + !)•*,•+- 


* See footnote on p. 103. 
t Stated to have been previously determined. 



A PRELIMINARY CLASSIFICATION OF ASIATIC RACES 
BASED ON CRANIAL MEASUREMENTS. 

By T. L. WOO, Ph.D. and G. M. MORANT, D.Sc. 

(1) Introduction . In a paper published in Biometrika in 1921* Miss M. L. 
Tildesley described a series of Burmese skulls and previously published data 
relating to other Oriental series were cited. Three years later a paper by one of 
the present writers in the same Journal gave original measurements of a Nepalese 
and of a Tibetan series and the comparative material used was extended consider¬ 
ably. Comparisons by the method of the coefficient of racial likeness led to results 
which appeared to be suggestive, but it was clear that nothing approaching even 
a preliminary classification of Asiatic races could be obtained by these means while 
the samples were so few in number and, generally, so small in size. Since 1924 
the measurements have been published of several new Asiatic cranial series. 
Professor Harrower’s paper of 192G deals with Southern Chinese and Tamil skulls f, 
Professor Black's of 1928 with Northern and Prehistoric Chinese, and Dr von Bonin's 
of 1931 with two Javanese, two Filipino, one Dayak and one Andamanese series. 
The new data incorporated in these three studies are ample enough to warrant a 
new survey of the craniology of Asia. All the latest material has been dealt with 
on biometric lines in the papers cited and use is made below, without further 
acknowledgment, of a number of statistical constants previously published. The 
majority of those which are not original are taken from Dr von Bonin's paper and 
we were greatly indebted to him for permission to use his results before their publi¬ 
cation. In his paper a fairly complete comparison has been made on biometric lines 
of the best Oriental series at present published. Our purpose has been to extend 
that examination to include all the best series available for the whole of the 
continent. There is only one good Asiatic cranial series which we have omitted 
consciously: this is one of Armenians recently described by Professor BunakJ, for 
it is clear that the type has closer affinities to some from the south-east of Europe 
than to any known Asiatic ones. On the other hand, we have included two non- 
Asiatic series: one is from the Aleutian Islands and the other is composed princi¬ 
pally of Kalmucks from Astrakhan. Both these are closely allied to Siberian types. 
Means for several of the series were obtained by pooling the measurements given 
by different craniometricians. Care was taken to use these data with as much 

* Inferences to all the papers mentioned here will be found in the following section of this paper. 

+ We have not usod the series of Hylam Chinese described by Professor Harrower in Biometrika , 
Yol. xx B , as it is suspected that the crania were artificially distorted. 

t Crania Armenica , Moscow, 1927. This is a supplement to the Journal Rusee d'Anthropologie , 
Vol. xvi. 



T. L. Woo and G. M. Morant 


109 


accuracy as possible and many doubtfully defined measurements had to be rejected. 
Some of the previously published pooled means have been revised and others are 
now given for the first time. A total of 26 male series was collected and the shortest 
of these is made up by 31 crania. Nearly all the means used are based oh 20 or 
more crania and those which could only be given for fewer than 10 specimens were 
ignored. This sample of racial types cannot be supposed a random one taken from 
all which it may be possible to distinguish among Asiatic peoples. More than half 
of the series come from Oriental regions; India and Siberia are poorly represented 
and there are no data whatever referring to the peoples of the south-west of the 
continent. 

The main purpose of the present paper is to present the coefficients of racial 
likeness between all pairs of the 26 series available. With the usual notation, the 
form of the crude coefficient used is 

x ( ’^r‘) - 1 * m4AS Vs- <■> ■- ■ 1 * 67119 7 Jr 

The reduced coefficient is defined to be 


50 x -*=4--' ]4 2 («) - ll + 50 x "l- n “' x -67449 J 
n H n 8 ' (M j n„n H > V . 


2 

AT* 


The reduced coefficient is designed to give a measure of racial divergence which 
does not depend upon the sizes of the samples compared, and the classification 
suggested is derived from its values. The usual 31 characters, or as many of them 
as are available in the case of a particular comparison, were used in calculating 
these constants. Crude coefficients are given for all these measurements and for the 
indices and angles alone, but only those of the first kind were reduced. The 
standard deviations of the long Egyptian E series of 26th—30th Dynasty crania 
described by Professor Karl Pearson and Miss A. G. Davin were used throughout *. 


(2) Measured Series of Asiatic Skulls. Male adult series are the only ones 
dealt with in the present paper. Experience has shown that comparisons by the 
method of the coefficient of racial likeness seldom lead to satisfactory results in 
the case of very small samples. All the series used have the majority of their 
means based on 25 or more crania. It is generally an advantage to have all the 
crania assigned to a single racial type measured by the same worker, but in some 
cases it is only possible to obtain a large enough sample by pooling the measure¬ 
ments provided from different sources. The need for restricting comparisons in the 
case of a particular measurement to pairs determined by using identically the same 
technique is, of course, one of prime importance. Definitions of all those used 
are given in Biometrika , Vol. xx B . 1928, pp. 362—364, where they are denoted by 
the biometric index letters, and the numbers in Martin’s Lehrbuch are also given. 
There are 19 absolute and 12 indicial and angular measurements used in computing 
the coefficients, but in some cases alternatives can be used as the vertical height 


* M On the Biometric Constants of the Human Skull.” Biometrika , Vol. xvi. 1924, pp. 828—868. 




preferred to another as 100 B!jL to 100 H/L. The preference in these cases is 
given to the one which is most frequently available. Most of the alternatives have 
been provided for the Asiatic series originally described in Biometrika. For most 
of the other series they are not given and several of the coefficients can only be cal¬ 
culated for a smaller number of characters than the total 31. Unless otherwise stated, 
the means for the series previously published in papers in this Journal have been 








T. L, Woo and G, M. Morant 


111 


accepted without modification. Where these were derived from other sources the 
ful] references were given and they are not duplicated below. Pooled means which 
have not previously been published are given in our Table I. The map (p. 110) 
shows the approximate districts from which the material was derived. The following 
abbreviations are used in the list below: A.fA.^Archiv filr Anthropologies 
A.S.D « Die anthropologischen Sammlungen Deutschland !«, being supplements to 
A.f. A .; Bm .« Biometrilca. 

(i) Aitas . Gerhardt von Bonin: “Beitrag zur Kraniologie von Ost-Asien." 
JJm., Vol. xxm. 1931, pp. 52—113. The Aetas (Philippine Islands) are found in 
the interior of Luxon Island and also in the islands of Mindoro, Panay and Negros 
and in the north-east of Mindanao. The tribe is a small one. The people are short 
of stature and they are generally spoken of as undoubted Negritos. Koeze gave 
measurements of a series of skulls at Leiden and 33 male specimens were re¬ 
measured by von Bonin. 

(ii) Aino. Koganei’s means of a series from Yezo and Kunashiri are quoted 
in Bm. t Vol. 1.1902, p. 426. The calvarial height is the Frankfurt vertical measure¬ 
ment (H ) as stated there and not the basio-bregmatic (IT) it was assumed to be 
in some later papers. The index 100 B/H is {101*2 (88)} in place of 98*8. We have 
omitted the orbital breadth and index and the palatal measurements, as they 
are inadequately defined, and added means for /ml (35*7 (76)), /mb (30*2 (81)), 
100 fmblfml (84*6 (76)), NA ({70°*2 (69)}) and A A ({71 p *2 (69)}). The profile angle 
is assumed to have been measured from the prosthion and not from the alveolar 
point. There are 88 male skulls. 

(iii) Andamanese . Pooled means of various short series, of which one has been 
re-measured by von Bonin, are given by him in J5m., Vol. xxm. 1931, pp. 84—85. 
There are 34 male skulls. 

(iv) Aleuts . Measurements of male Aleutian crania were taken from the 
following sources: 

(а) Ales Hrdlicka : "Catalogue of Human Crania in the United States National 
Museum Collections.” Proceedings of the United States National Museum , Vol. LXIII. 
1924, pp. 1—51 (25 crania). 

(б) A. Tarenetzky: “Beitrage zur Skelet- und Sch&delkunde der Aleuten, 
Konaegen, Kenai und Koljuschen mit vergleichend anthropologischen Bemerk- 
ungen.” Mimoires de VAcaMmie Imperiale des Sciences de St Pitersbourg , vm e 
s6rie, T. IX. 1900, pp. 1—73 (7 crania). 

(c) Herman F. C. ten Kate: Zur Graniologie der Mongoloiden: Beobacktumgen 
und Messungen . Diss., Berlin, 1882 (1 cranium). 

(d) George Montandon: “ Craniologie Pal6osib6rienne. Seconde Partie.” 
L*Anthropologie, T. xxxvi. 1926, pp. 447—542 (2 crania, omitting No. 4829 which 
is deformed). 

These skulls came from several different islands of the Aleutian archipelago 
and one from Kodiak Island is included. Pooled means are given in Table I below. 



112 Classification of Asiatic Races 

(v) Buriats. Measurements of male Buriat crania were taken from the following 
sources: 

(a) AleS HrdliSka: loc. dt. f pp. 46—47 (19 crania). 

(b) Michael Reicher: “ Untersuchungen iiber die Sch&delform der alpen- 
landischen und mongolischen Brachycephalen.” Zdtschrift filr Morphologie und 
Anthropologie , Bd. xv. 1913, Tabelle 3 a (15 crania). 

(c) Julius Fridolin: ‘‘Burjaten- und Kalmuckenschadel.” A.fA. } Bd, XXVII. 
1900—1902, S. 304—305 (7 crania). 

( d ) Herman F. C. ten Kate: op. dt . (4 crania). A few additional measurements 
of these specimens which we were able to use are given by Haberkorn in Zdtschrift 
filr Ethnologie , Bd. x. 1878, S. 307. 

The localities from which the majority of these Buriat skulls were obtained are 
known and they all lie in a comparatively small area round the southern extremity 
of Lake Baikal, and nearly all are to the east of that lake. The Buriats are known 
to have moved into this region from the Amur District north of Manchuria in the 
13th century. 

(vi) Burmese A. A series from the neighbourhood of Moulmein was divided 
into three groups of which the Burmese A , supposed true Burrnan, is one. Measure¬ 
ments were taken by Miss Tildesley and means are given in Bm, Vol. XIII. 1921, 
p. 239. In computing the coefficients we have added to these NA (66°*8 (38)) and 
AL (70°*5(38)). The palatal breadth was taken between the inner alveolar walls 
at the second molars, and this is said to have been less in some cases than the 
breadth between the inner rims of the alveoli of the second molars. The second of 
these measurements, which is Martin’s and the one generally used to-day, gives 
a male mean of 41*6(29), giving indices 100 Q*/Gi = 84*0 (27) and 100 0%jQ\ = 
{91*6(29)). These values were used in computing the coefficients. There are 
44 male Burmese A skulls. 

(vii) Chinese: Fukien. Gordon Harrower: "A Study of the Hokien and the 
Tamil Skull.” Transactions of the Royal Sodety of Edinburgh , Vol. Lix. Part III 
(No. 13), 1926, pp. 573—599. Measurements of 36 male skulls of unclaimed coolies 
from the southern Chinese province of Fukien (Hokien) are given. The means are 
quoted, with a few corrections, in Bm ., Vol. xxm. 1931, pp. 84—85. 

(viii) Chinese (Koganei ). In 1902 Koganei gave measurements of the skulls of 
70 Chinese soldiers who had been killed in the war with Japan. The collection was 
made in the northern provinces of Shantung and Chihli and in Southern Manchuria, 
but the regions from which the soldiers came are unknown. Means are quoted in 
Bm., Vol. XVI. 1924, pp. 48—49. In calculating the coefficients we omitted the 
insufficiently defined palatal measurements and the profile angle was assumed to 
be from the prosthion and not from the alveolar point. 

(ix) Chinese: Peking . Davidson Black: “ A Study of Kansu and Honan 
Aeneolithic Skulls and Specimens from later Kansu Prehistoric Sites in Com¬ 
parison with North China and other recent Crania. Part I. On Measurements and 



T. L. Woo and G. M. Morant 


118 


Identification.’' Palaeontologia Sinica , Series D, Vol. vn. 1928, pp. 1—83. Measure¬ 
ments are given of 86 male skulls from the northern provinces of China collected 
in Peking dissecting rooms. The majority of the men came from Chihli, Shansi 
and Shantung, but there were some from Shensi, Fengtien and Northern Honan. 
Means are quoted in Bm,, Vol. xxm. 1931, pp. 84—85. 

(x) Chinese: Prehistoric. Davidson Black: loc. cit. The Pooled Prehistoric series 
described in. this paper comprises skulls from Kansu and Honan of the Early 
Bronze, Copper and Aeneolithic periods. It is shown that there is sufficient 
justification for combining this material. There are 64 male specimens though 
many of these are imperfect. Means are quoted in Bm. y Vol. xxm. 1931, pp. 84—85. 

(xi) Chukchis. Julius Fridolin: “ Tschuktschenschadel.” A.f.A ., Bd. xxvni. 
Supplement, 1904, S. 1—17. Measurements of 35 male skulls are given. The 
Chukchis inhabit the extreme north-east of Asia with the exception of some points 
on the coast which are said to be occupied by Eskimos. Of these specimens described 
by Fridolin 20 came from the Chukchi area proper and 15 from the Eskimo area. 
Montandon (loc. cit. infra , pp. 284—285) has given the means of 100 BjL , 100 HjL 
and 100 NB/NH for the two groups separately and also for the corresponding female 
and juvenile groups. The cephalic indices are practically identical in the case of the 
adult groups compared, while the Eskimo area group has the higher height-length 
and the lower nasal indices for both sexes. The difference between the male height- 
length indices is just significant and all others are insignificant. It is probable 
that significant differences between the types derived from the two areas would be 
found if more adequate material were available. In order to obtain a large enough 
sample for present purposes, however, we pooled all the specimens measured by 
Fridolin. Male means are given in Table I below and these were used in computing 
the coefficients with the other Asiatic series. Measurements of other male Chukchi 
skulls are given in the following sources: 

(a) Ales Hrdlifika: loc. cit, pp. 16—17 (5 crania, 4 of which may be of mixed 
Chukchi and Eskimo origin). 

(b) George Montandon: “ Craniologie Pa16osib4rienne (N6olithiques, Mongo- 
loides, Tchouktchi, Eskimo, Aleoutes, Kamfcchadales, A'inou, Ghiliak, N^groides du 
Nord).” L'Anthropologie, T. xxxvi. 1926, pp. 209—296 (10 crania, omitting No. 4 
which is distorted, of which the majority are Chukchi proper and a few may be 
of mixed origin). 

The pooled cephalic index for Hrdlifika’s and Montandon’s short series is 75*7 (14) 
and for 9 characters a coefficient is found with Fridolin’s series of 2*26 + *32 (reduced 
11*74 ± 1*65). The pooling of all the material is hardly justified. 

(xii) Dayaks . Gerhardt von Bonin: loc. cit. Original measurements of three 
short series of Dayak skulls at Leiden were pooled with those of another described 
by Emil Schmidt. There are 55 of these male skulls from Borneo in all. 

Biometrika xxiv 8 



114 Classification of Asiatic Races 

(xiii) Dravidians. Means of two Indian series measured by Sir William Tomer 
are given by Miss B. N. Stoessiger in J3m., Vol. xix. 1927, p. 128. The first is of 
Dravidian skulls from the Central Provinces and Orissa, including one Tamil from 
Madras, and the second of Dravidians (" Kolarians”) from Southern India. An 
insignificant coefficient was found between these groups and hence they were pooled, 
giving a total of 32 male specimens. These means are distinctly differentiated from 
those of a Maravar (Dravidian) series from Madras. The Dravidian series used 
below is the pooled one derived from Turner’s measurements. It was necessary to 
modify the published means: the horizontal circumference is Glabella U and not 
V , the orbital breadth Lacrymal Oi and not 0± 9 the transverse arc is Broca’s Q 9 and 
not Q' , and the mean should be 2961 (31) in place of 302*0 (34); the angles of the 
fundamental triangle are based on 28 skulls, not 30. The capacity was omitted in 
calculating the coefficients. 

(xiv) Hindus . Measurements of male Hindu crania were taken from the 
following sources: 

(a) Jacopo Danielli: “ Studio sui crani bengalesi.” Archivio per VAntropologia 
e VEtnologia , Vol. xxii. 1892, pp. 371—448. Measurements are given of 42 male 
crania of Hindus of the inferior castes from the banks of the lower Ganges. 

(b) Sir William Turner: “ Contributions to the Craniology of the People of the 
Empire of India. Part II. The Aborigines of Ch6ta N&gpiir and of the Central 
Provinces, the People of Orissa, the Veddahs and Negritos.” Transactions of the 
Royal Society of Edinburgh , Vol.JCIr. 1901, pp. 59—129. Measurements are given 
(Tables VI—VIII) of 25 male skulls classed as UriyA (or Oorid). This is the 
mother-tongue of the vast majority of the Hindu peoples of Orissa who inhabit 
the plains. 

(c) Paolo Mantegazza: “Studii sull* etnologia dell’ India.” Archivio per VAn¬ 
tropologia e VEtnologia , Vol. XIII. 1883, pp. 177—241. Measurements are given 
(pp. 212—215) of 24 male skulls from Southern India. 

Coefficients of racial likeness between these three groups of Hindu skulls are 
given in Bm., Vol xx B . 1928, p. 298; two values are insignificant and the other 
is 1*74 ± *21. Pooled means are given in Table I below. 

(xv) Japanese . Pooled means, based principally on the measurements of Ono 
and Adachi, are given in Bm., Vol. xxiil 1931, pp. 84—85. There are 138 male 
skulls from different parts of both islands represented, but few means are available 
for more than 50 specimens. 

(xvi) Javanese: Bantam and Batavia. Gerhardt von Bonin: loc.dt. This series 
from the west of the island comprises 55 male skulls preserved at Leiden. 

(xvii) Javanese: Middle and East . Ibid. The 65 male skulls at Leiden on which 
the means are based came principally from the middle and east of the island, 
though a few are from unknown localities. 



T. L. Woo and G. M. Mobant il5 

(xviii) Kalmucks . Measurements of Kalmuck skulls were taken from the 
following sources: 

(a) S. Sommier: “Note di Viaggio. II. MordvA-Popolazione di Astrakan-Kal- 
mucchi.” Arohitio per VAntropologia e VEtnologia, Yol xix. 1889, pp. 117—157 
(7 crania from Astrakhan). 

(b) Julius Fridolin: loc . tit. ( v)c (9 crania from Astrakhan Province, 3 from 
Tomsk Province and 3 from unknown localities). A few additional measurements of 
these skulls taken by C. M6rejkowsky ( Revue d* Anthropologie, 2 e s6rie, T. vil 
1884, pp. 296—297) could also be used. 

(c) Michael Reicher: loc . tit (19 crania from Astrakhan Province). 

( d) Herman F. C. ten Kate: loc . cit (1 cranium from Astrakhan Province and 
3 from unknown localities). Additional measurements of 3 of these specimens given 
by Haberkorn {loc, tit) could also be used. 

(e) J. W. Spengel: A, S. D., Gottingen Catalogue, 1874, S. 40 (3 crania from 
the Astrakhan district and 3 from unknown localities). 

(/) J. Deniker: “l£tiide sur les Kalmouks. Suite.” Revue d'Anthropologie, 2® 
s6rie, T. vn. 1884, pp. 277—310 (4 crania from unknown localities previously 
measured by Quatrefages aiid Hamy). 

The majority of the skulls forming these short series are known to have been 
obtained in the Astrakhan region, 3 came from Central Siberia and a certain 
number from unknown localities. Pooled means are given in Table I below. 

(xix) Mongols. Ales Hrdlicka: loc. tit , pp. 40—43. There are 114 male skulls 
from Urga in Northern Mongolia which is immediately to the south of and 250 miles 
distant from Lake Baikal. The capacities given were not used in calculating the 
coefficients. Means are quoted in Table I below. 

(xx) Nepalese. G. M. Morant: “A Study of certain Oriental Series of Crania 
including the Nepalese and Tibetan Series in the British Museum (Natural 
History).” fim., Vol. XVI. 1924, pp. 1—104. There are 48 male skulls from different 
parts of the country. 

(xxi) Soyotes. G. Debetz: “The Anthropological Type of the Turanians of the 
Kemtchik and Tannu Regions (Soyotes).” North Asia , 1929, pp. 127—140 (in 
Russian). The Soyotes (Soyots, Soyons or Soiotes) are presumed to be of Finno- 
Turkic origin. They form a small community to-day in the border country between 
the Sayan and Altai mountains to the west of Lake Baikal. Means given for 9 male 
skulls from the Kemtchik and 31 from the central region are closely similar and 
the pooled values are given in Table I below. 

(xxii) Tagals. Gerhardt von Bonin: loc . tit. The Tagals (Tag&logs) of the 
Philippine Islands constitute the bulk of the population of Manila, Mindanao and 
central Luxon. They are classed as one of the brown, non-negrito tribes of the 
islands. Koeze gave measurements of a cranial series at Leiden and 31 male 
specimens were re-measured by von Bonin. 


8—2 



116 Classification of Asiatic Races 

(xxiii) Tamils. Gordon Harrower: loc. cit. Unclaimed bodies of 35 Tamil 
coolies were available for study at Singapore. Apart from the fact that the men 
came from Southern India, nothing is known of their origin. The means are quoted 
—a few being corrected—in Table I below. 

(xxiv) Teleiighites. The Telenghites (or Teleuts) are a small Tatar tribe 
inhabiting to-day the lowlands of the Altai region of Southern Siberia between 
Lakes Balkash and Baikal. Means reduced from Ueicher’s measurements of 60 male 
skulls are given in Bm. y Vol. xxiii. 1931, pp. 84—85. 

(xxv) Tibetans A. G. M. Morant: loc. cit Means are given for 37 male skulls 
from the south-west of the country which conform, as all the inhabitants of that 
district are supposed to do, to the Tibetan A type. 

(xxvi) Veddahs. Measurements of male Veddah skulls were taken from the 
following sources: 

(a) Sir William Turner: loc . cit . (7 crania). 

(b) Paul u. Fritz Sarasin: Ergebnisse naturwissemchaftlicher Forschungen aaf 
Ceylon , Bd. in. Wiesbaden, 1892—3, S. 198—307 (22 crania). Most of these skulls 
were subsequently measured by Liithy ( A.f\ A Bd. xxxix. 1912, S. 70) and a few 
of his additional measurements can be used. 

(c) Arthur Thomson: “On the Osteology of the Veddahs of Ceylon.” Journal 
of the Anthropological Institute , Vol. xix. 1889, pp. 125—158 (6 crania at Oxford). 

(d) William Henry Flower: Catalogue of the Specimens...in the Museum of the 
Royal College of Surgeons, Fart I. 1907 (15 crania). 

(e) Rudolf Virchow: “Ueber die Veddahs von Ceylon.” Abhandlungen der 
Konigl. Akademie der Wissenschaften zu Berlin , 1881 (1 cranium). 

(/) Rudolf Virchow: Zeitschrift fur Ethnologie, Bd. XIV. 1882, Verhandlungen, 
S. 302, and Bd. xvn. 1885, Verhandlungen, S. 500 (3 crania). 

( g ) Riidinger: A. S. D. } Mlinchen Catalogue, 1892, No. 414c (1 cranium). 

The pooled means for these Veddah skulls are given in Table I below. 

These 26 series have been used in order to arrive at a preliminary classification 
of the races they represent, and for this purpose the coefficients of racial likeness 
were calculated for every possible pair. A few coefficients have also been found 
with the following shorter series which represents a particularly interesting racial 
type. 

(xxvii) G. M. Morant: 41 A First Study of the Tibetan Skull.” Bm. y Vol. XIV. 
1923, pp. 193—260. The series comprises 15 male skulls from the eastern province 
of Ehams said to belong to the Tibetan B type which is clearly differentiated from 
the Tibetan A. 



T. L. Woo and G. M. Morant 


117 


(3) Coefficients of Racial Likeness between Asiatic Series . When the coefficients 
of racial likeness between all the Asiatic series are compared, it is found that a sharp 
distinction is made between a group of six which are closely allied to one another 
and all the remaining ones (cf. Tables II, V and VI below). The coefficients between 
these six are given in Table II. Some of the regions represented are widely separated 
geographically. The vast majority of the Kalmuck skulls used come from the 
Astrakhan region which is 2000 miles to the west of the district inhabited by the 
neighbouring Telenghites (Teleuts), Buriats, Soyotes and Mongols; and it is more 
than 3000 miles from there to the Aleutian Islands. But all the races in the group 


fio.2. THE RELATIONSHIPS OF NORTHERN 
MONGOLIAN RACES. 

1700 


MONQOLS< 

/(81*4> N 


<67 


Jr. 


BURIAT$— 

(83*1) 


3-31 


^ > <?/ 
sO> 

KALMUCK \ / 


I 

\ 


ALEUTS 

(821) 


_ 70.62 • / 

-SOYOfES 

. (82*3) 


o* 


A. 


REDUCED COEFFICIENTS OF RACIAL LIKENESS. 


■ LESS THAN S 
5-19 

• 12-19 


TELENGHITES 

( 86 * 4 ) THE NUMBERS IN BRACKETS ARE THE CIPHAUC INDICES OF THE SERIES. 


come from the northern parts of Asia and the islands in the Bering Sea, while the 
Chukchi is the only other type from that region for which we have adequate 
craniological data. The group may be referred to as that of the Northern Mongolian 
races, and it is probable that it would also include the Satnoyed, Tungus, Ostiaks 
and Yakuts. The scheme of relationship suggested by the lowest reduced coefficients 
is shown in Fig. 2. It is surprising to find that the samples representing the 
Kalmucks and Soyotes may be supposed to represent identically the same racial 
type. In spite of the fact that no distinction can be made between these two when 
they are compared directly, their relationships to the other series are bj r no means 
the same. The Buriats are very close to the Kalmucks, but distinctly removed from 






TABLE I. 

Mean Measurements of Male Series of Asiatic Crania*. 


Classification of Asiatic Races 


118 






TABLE I ( continued ). 


T. L. Woo akd G. M. Mobakt 


^ is 

§ Iff gggfgs III I £||| 

| | © co op | | | | | co 05 04 | | r- | | | .-« © ap o* 

i>. co r- 04 co -<* © © © © 04 S sL 

« coco i>> r»,©.©, oo © to to ao o8 fe R c* 

¥ 

II 

S' S' S' S' S'S'S' S' S'S'S'S S' S' S' sss 

GO |iO Iks I I 1*1 |®COM I r-c | (H Wt- GD | | (N | |iC | •*< «! i-h 

35 8 8 S 8SS 8 88SS 8 & £ 

II 

S' SS'S h c? S S www 

S-w ® ^ ® ao 

K) 1 | «p to GO | I | 1 -- t- I OH 1 5> 1 1 « 1 I 1 OS 1 1 1 1 ooo 

35 ss i | 2 s 

Chnkchis 

(Fridolin) 

S' S' S' s S' 2 S' 

CQ Cv C3 CQ Cv W TO 

? 1 1 1 9 1 1 1 1 1 ? 1 « 1 « 1 1 « 1 9 1 I 1 1 1 I 1 II 1 

*o r» ^ ^ 

04 co r» r* o &« w 

Mongols 

(Hrdlidka) 

S' S' S' S' S' S' 

pH l-H |«H P“* H »-H pH 

•« 1 w 1 9 I I 1 ! I rr 04 | CO | | | I © | | O | I | | | I I I 

S 8 § aor^S ^ © 

• »-H 

Soyotes 

(Debetz) 

e? «S 8 « 88 8 8 « 55 

» 1 9 1 ^ 1 9 111««1f1111911«1111^111 

8 8 35 8 S5 2 8 8 $g 

Kalmucks 

(pooled) 

II III IllllStllfll 1 I lift 

t- 05 J N WiO | |'jfp5»^Cp'^irtt’-^<N©04<p05 | l> |*p | ^05 X W 

© ©Si) bw WrtHib^WOMOSt'H © *1 'jL { «* 

04 <<* ^ CO © ©, © ac © © rfi 

fl 

S'S'aoS'SS' © ®IoP i’* Secs' r^'i>S'ao‘ S' ao ^ao'ao'ao 

^»H H '—' PH^ '—■'' v ——' CO^ -- —" r—< ^ pH F—^ CO^ - pH^ pH^ v — v -'' pH '—' N —^ 

^ © © CO 05 © | 1 CO 05 1^- © © 04 >C ao 00 CO <p I H I 05 cp 

©coSS^co^iSoSapf^rfpH© o* ° 

04 ^ co co co ^ ao r^t— ©©r-^ioao®© © ao © R 

Aleuts 

(pooled) 

S' S'o'S' S't^' a> »c © co co co 

§, CO G4^ 04^ co ©4.T-H. "~' wv - 

« 1 (0«« 1 | 1 | | HO I 9 I 9 | I -H* I I 99 1 1 1 1 9«9 

8 888 sg S S 8 88 *sj83 

1 

tei «5 «s ® 94 £ c, 5S «, to g*g J 35 ^ C5 § § j tforf 3 s«i ^ as 

©-tfic? 222228 88888^88^1 

I 8 


119 


* The means in this table have not been given before in Biometrika and those of the other series used in the present paper will be found in earlier volumes of this 
Journal. The capacities were omitted in the case of the pooled series as this measurement has been determined in a number of ways which may give sensibly different 
results. The mean indices and angles in curled brackets were found from the means of component absolute measurements instead of from individual values. 






TABLE II. 

Coefficients of Racial Likeness between Northern Mongolian Races*. 


120 


Classification of Asiatic Races 



* The numbers following the designation of the race are the mean numbers of skulls available for the characters used in computing the coefficients—the w*s—in the 
case of the comparison which involves the largest number of these characters for the particular series. When fewer than this maximum number can be used the n maj 
differ to some extent from the one given. The numbers in brackets following the coefficients are the numbers of characters on which they are based. 
























TABLE III. 

Coefficients of Racial Likeness between Indian Races*. 


T. L. Woo and G. M. Morant 


121 



See footnote to Table II. 











122 Classification of Asiatic Races 

the Soyotes. Each one of the five other series has its lowest reduced coefficient with 
the Kalmucks, but the very similar Soyotes do not occupy such a central position. 
The Telenghites are distinguished from the other series by having no close connection 
with any one of them. There seems to be little relation between the affinities of 
the types and their geographical positions, and this is doubtless due to the fact that 
most of them represent nomadic, or semi-nomadic, peoples. The same condition 
would doubtless explain why several pairs of widely separated races have such close 
relationships, but far more material might be needed to establish any close corre¬ 
spondence between the known migrations and the present-day affinities of the types. 

The second group of Asiatic races distinguished by the coefficients is one made 
up by all the Indian series available. Crude and reduced values are given in Table III 
and the arrangement suggested by the latter is shown in Fig. 3. The connections 
are less close than those between the Northern Mongolian races, there being no 
reduced coefficient less than 5 and only two—Hindus with Nepalese and Veddahs 
with Dravidians—less than 12. But when a rather higher limit is considered, the 
series are found to have numerous connections with one another. Every one has a 
reduced coefficient less than 19 with every other one, except in the case of the 
Veddahs and Tamils, and there the value is 21*66. It is interesting to note that the 
Veddahs and Dravidians have their lowest coefficient with one another, while no 
sharp distinction can be made between them and the other Indian races. There is 
a rough correspondence between the affinities and geographical positions of the 
races, though an exception to this is the fact that the Tamils from Southern India 
approach most closely to the Hindus of Bengal, while they stand appreciably closer 
to the Nepalese than to the Veddahs of Ceylon. 

The coefficients of racial likeness between 12 of the remaining series are given 
in Table IV and the connections provided by the reduced values less than 19 are 
shown in Fig. 3. A number of low values are found and every one of these types 
has at least one reduced coefficient less than 10. The group as a whole will be 
referred to as that of the Oriental races, and it can be distinguished clearly from 
both the Northern Mongolian and Indian groups (see Tables VI and VII). In spite 
of the large distances which separate the localities from which the three modern 
Chinese series and the Japanese were procured, the types are found to be closely 
similar*. The Prehistoric series is equally and more distantly related to the three 
others from China, but this connection is still more intimate than any which has been 
foupd between one or another of the Veddah, Dravidian, Tamil or Telenghite groups 
and any other Asiatic series. The races of China represented here and the Japanese 
may be considered to form a sub-group of the Oriental one, their closest relationships 
to one another being decidedly more intimate than any between them and other types. 
The remaining Oriental series also form a closely inter-related sub-group. It was 
surprising to find, as Dr von Bonin has observed, that the Tagals from the Philippine 

* The reduced coefficients in Table IV may be compared with that of 8’88 £*19 between the male 
Farringdon Street and Whitechapel series which both represent the population of London in the 
17th century. 





T. L. Woo and G. M. Mokant 


123 







124 Classification of Asiatic Races 

Islands and the Dayaks from Borneo are almost identical in type. The only reduced 
coefficients less than 19 between the two sub-groups are those connecting the 
Prehistoric Chinese, Fukien Chinese and Japanese with the Tagals and Dayaks. 
The close link between the last and the Tibetans of the A type is also an unexpected 
relation. The connection between the Dayaks and one series from Java, which is 
itself closely connected with another series from Java and the Burmese A, might 
have been anticipated; but the fact—previously noted by Dr von Bonin—that the 
Aetas (a so-called negrito people from the Philippine Islands) are also intimately 
connected with the Javanese and Burmese is one of peculiar importance. The 
reduced coefficient between the Tagals and Aetas is 26 05 ± *65, but the latter have 
links with other supposed non-negrito races of the Orient which are of precisely the 
same order as the lowest which can normally be found between one Asiatic series 
and another. This in itself provides sufficient justification for questioning the 
validity of the negrito hypothesis, and other evidence considered below raises the 
same doubts. It may be noted that there is, in general, no very close correspondence 
between the racial affinities of the series, as measured by these methods, and the 
geographical positions of the Oriental populations they represent, except in the case 
of the Chinese and Japanese groups. Migrations, of which some are known to have 
taken place in recent times, may well account for this condition. Contrasted with it 
is the extraordinary uniformity of the Chinese type. If more adequate material were 
available from remote parts of the country, there can be little doubt that distinct 
racial differences would be found, but there is already a clear suggestion that the 
greater part of the enormous population of China conforms more closely to a single 
racial type than do the present-day populations of several European countries*. 

It has been suggested above that the coefficients of racial likeness make possible 
a division of the majority of the Asiatic series into three distinct groups, or four 
may be distinguished if the Oriental one is divided into two. Such a classification 
is, of course, only provisional and it is not unlikely that the apparent divisions 
between the groups will become less precise as more material becomes available. The 
arrangement arrived at is shown in Figs. 2 and 3, and it depends only on the evidence 
provided by the lowest reduced coefficients, none greater than 19 being considered 
in the diagrams. The experience derived from a similar comparison of European and 
other races has suggested that the most consistent results are to be obtained by 
considering only the closer degrees of affinity. The majority of the larger coefficients 
between the Asiatic series have yet to be presented, but before doing this it will 
be cqnvenient to notice the order of the highest reduced values found between pairs 
of series belonging to the same group. In the case of the Northern Mongolian races 
(Table II) the highest coefficient is 51*07 between Telenghites and Mongols, but if 
the Telenghites are omitted the extreme is 25*01 between Buriats and Aleuts. The 
greatest divergence between two Indian series (Table III) is found in the case of 

# Cf. “ The Use of Biometric Methods applied to Craniology.'* Biometrika , Vol. xvni. 1926, pp. 414— 
417. The coefficients of racial likeness are given in the above paper between a Northern and a Southern 
Chinese series compiled from various sources, but not used in the present paper, and the Fukien and 
Koganei’s series. All are of a low order. 



T. L. Woo and G. M. Morant 


125 


the Veddahs and Tamils having a reduced coefficient of 21*66. For the Oriental 
series the maximum value is 66*24 between the Chinese (Koganei) and Javanese 
(Bantam and Batavia): for the Japanese and Chinese alone the extreme is 13*61 in 
the case of the Japanese and Prehistoric Chinese, and for the other Oriental races 
it is 32*08 between the Tagals and Javanese (Bantam and Batavia). If the Oriental 
group is sub-divided in this way, and if the Telenghites are omitted from the 
Northern Mongolians, then the maximum reduced coefficient between two members 
of the same group is of the order 30: if these restrictions are not made the maximum 
is of the order 56. 

We may turn now to Table V which gives the crude and reduced coefficients of 
racial likeness between the six Northern Mongolian and the five Indian races. The 
30 reduced values range from 121*1 to 429*6, and the last appears to be the highest 
that has yet been fotiud in comparisons between any pair of races in the world. 
Crude coefficients of racial likeness between all possible pairs of 41 European and 
Egyptian series have been published *, and when these 820 values are reduced the 
highest is found to be 174*0 between a British Neolithic and a Bavarian (Waischen- 
feld) series. Much greater divergences than this are evidently found in Asia. The 
greater reduced coefficients in Table V still, however, permit us to arrange the 
series in an orderly sequence. The Kalmucks, Telenghites, Buriats, Soyotes and 
Mongols all have their highest values with the Veddahs, their next highest with the 
Dravidians and so on in the order of the five Indian series given in the table. The 
Aleuts give the very similar order : Veddahs—Dravidians—Tamils—Hindus— 
Nepalese. In spite of the fact that all the Indian races are widely removed from 
all the Northern Mongolian races, there is thus clear evidence that the latter group, 
considered as a whole, resembles the Nepalese far more closely than it does the 
Veddah type. The linear arrangement given to the different Indian series by this 
means was not found to express their true relationships when they were compared 
directly. Reading the table in the other direction, it will be seen that there is less 
uniformity in the orders in which the six Northern Mongolian series are arranged 
by their reduced coefficients with the different Indian series. The lowest values are 
with the Kalmucks in three cases and with the Telenghites in the other two, while 
the Northern series furthest removed from the Indians are the Mongols and Aleuts. 
These arrangements such as can reasonably be accepted as indicating the true 
and somewhat complex relationships of the various types. The crude coefficients 
would suggest a different and a less consistent scheme of relationship which is far 
less likely to correspond to the actual racial links. 

The 72 coefficients between the Northern Mongolian and Oriental series are 
given in Table VI. The range of the reduced values is from 43*2 (Burmese A and 
Telenghites) to 184*1 (Chinese Prehistoric and Aleuts), and there are 12 values less 
than 60. The closest connections here thus indicate a rather more intimate degree 

* G. M. Morant: “A Preliminary Glassification of European Races based on Cranial Measurements. ” 
Biometrika , Vol. xx B . 1928, pp. 801—375. The reduced coefficients derived from the crude ones given in 
this paper have not been published. 



TABLE V. 

Coefficients of Racial Likeness between Northern Mongolian and Indian Races*. 


126 


Classification of Asiatic Races 



See footnote to Table II. 






























T. L. Woo and G. M. Morant 


127 


of relationship than the roost distant found between any pair of series belonging to 
the same group. The absence of any coefficients less than 40 suffices to make a 
trenchant division between the Northern Mongolian and Oriental groups. As 
a similar treatment of material from other parts of the world has shown, the most 
convincing and reliable classification is obtained by considering almost exclusively 
the closest degrees of relationship, and without attempting to reconcile the system 
built up in this way with one which might be derived from considering.only the 
values of distant relationships. The reduced coefficient of 184-1 is greater than 
several found between the Tamils and Nepalese and the Northern Mongolian series 
(see Table V), but it is nevertheless true that the latter group, considered as a whole, 
resembles the Oriental far more closely than it does that of the Indian races. It 
might be expected that the Northern Mongolians would bear a closer resemblance 
to the Chinese and Japanese than to the other Oriental races, but this is not found 
to be true. The orders in which the reduced coefficients with the single Northern 
Mongolian series arrange the Oriental series are not closely similar—no two being 
exactly alike—but the Burmese, Japanese and Chinese (Koganei) tend to have the 
lowest values, while the highest are shown in most cases with the Prehistoric Chinese 
and Peking Chinese series. The different Chinese types are thus contrasted with 
one another in this way, and it is the southern type which resembles the Northern 
Mongolians more closely than the northern type. This inversion of the order 
we should have expected need not be emphasised, as all the coefficients concerned 
are high, and the conclusion that each Northern Mongolian race is equally related to 
all the Oriental races is, perhaps, the safest to accept in the present state of our 
knowledge. It is possible that some intermediate types may be found which will 
connect the two groups, but such are only likely to represent Manchuria and possibly 
Korea. It may be observed that the Telenghites and Kalmucks tend to be the 
Northern Mongolian series which most closely resemble the Oriental ones, while the 
Aleuts and Mongols are those furthest removed, but there is little uniformity in 
this matter. 

Table VII jrives the coefficients of racial likeness between the Indian and Oriental 
races. The rJ 4 ^ced values range from 12*6 to 159*0 and there are 15 less than the 
lowest (43*2) r md between a Northern Mongolian and Oriental series. In every 
case but one Jie Indian race with the lowest coefficient is the Nepalese, the 
exception being that the Tamils have a slightly, though not significantly, lower 
value with the Burmese A. It may be remembered that the Tamil skulls were 
collected in Singapore, though there can be little question as to their authenticity 
as all their closest relationships are with Indian series. The Hindus and Tamils 
appear to be almost equally removed from the Oriental populations, the Dravidians 
are more distant and the Veddahs most distant of all. Judging from all the 
coefficients, the. Veddahs are rather further removed from the Oriental races 
(c£ Tables VI and VII) than are the Kalmucks, Telenghites or Buriats. By reading 
Table VII in a horizontal direction it can be seen that the Tibetan A, Dayak and 
Tagal series are the Oriental ones which most closely resemble the Indian, while 
the Chinese and Japanese group, on the one hand, and the Malayan and Aetas, on 



128 Classification of Asiatic Races 

the other, are distantly and almost equally removed. The Indian and Oriental races 
are thus connected up by way of Nepal and Tibet, while there is no suggestion, as 
far as can be seen from this material, of a linkage south of the Himalayas. The 
division between the two groups (see Fig. 3) appears to be justified by the fact that 
the Tibetan A skulls were collected from the south of the country which is 
conterminous with Nepal, while the Nepalese have a decidedly more intimate 
connection with an Indian race and the Tibetans with an eastern one. It is quite 
possible, however, that this division would be found less marked if more abundant 
material were available. 

A provisional classification into three groups is reached by these means. Every 
series has its closest connection, and in most cases all its closest connections, with 
other series in the same group. Importance is only attached to the closest degrees 
of relationship and there are numerous examples of two series belonging to the 
same group being further removed from one another than one, or both, of them is 
from series belonging to other groups. We are really dealing, of course, with a 
continuous system, and a diagram such as Fig. 3 illustrates the true state of affairs 
far better than any system of grouping can. 

In the comparisons between the various Northern Mongolian races every series 
has one, or more, reduced coefficients less than 16, and if the Telenghites are omitted 
each series has one or more reduced coefficients less than 8 (Table II). The same 
limit for the Indian series (Table III) is 14 and for the Oriental (Table IV) 10. 
We may now consider the relationships of four other types which are supposed 
specialised since they cannot be placed in any one of the three groups of races 
distinguished. Crude and reduced coefficients of racial likeness are given in 
Table VIII. The Aino have one fairly close connection with the Japanese, the re¬ 
duced value being 12*54. It may be suggested that this is sufficiently low to warrant 
the inclusion of the Aino in the Oriental group, but this seems inadvisable as the 
series has no other reduced coefficient less than 24 with members of that group. 
As has been suggested before*, the Japanese appear to have resulted from a cross 
between the Southern Chinese and the primitive island race, b£\ig now more 
closely related to the mainland type. If this be so, then the Ait# may have had 
a very different origin from the bulk of the Oriental races. It ma*_, noticed that 
they resemble the Tagals more closely than they do any Chinese ra e. 

It is not surprising to find that the Chukchis from the extreme north-east of 
the continent are of another aberrant type. They are widely removed from the 
other Siberian races, and from the Aleuts, and the only reduced coefficient less 
than 20 with the series so far considered is the value of 18*27 with the Prehistoric 
Chinese. The three other Chinese series are almost as close, but it is suggestive 
that the prehistoric one should be distinguished from them in this way. 

The Tibetan B skulls, from the easterly province of Khams,are known to repre¬ 
sent a different race from those of the A type which come from the south-west of 
Tibet. The former series comprises 15 male specimens and the sample is hence 

* Biomctrika , Vol. xvi. 1924, pp. 61—62. 



T. L. Woo and G. M. Morant 


129 


too small to lead to any reliable results. The few coefficients in Table VIII are 
suggestive, however. The only reduced value less than 20 is that of 14*46 with 
the Chukchis. All the Chinese types are further removed, but the closest connection 
here is with the Prehistoric series, and the Tibetan A type is more distant still. 

The relationships of 25 Asiatic series (excluding the Tibetan B) have been 
considered above and the distribution of the lowest reduced coefficient found for 
each series ranges from a negative value (Kalmucks and Soyotes) to 15*55 (Kalmucks 
and Telenghites). The lowest with the Andamanese is that of 41*79 with the 
Tibetans A . There is, in general, a fairly close correspondence between the crude 
coefficient of racial likeness for all characters and for indices and angles alone, 
though a few marked differences may be found. Ignoring cases for which the 
second coefficient is based on fewer than 5 characters, there are found to be 
5 examples in Tables II—VII of the value for all characters being more than 
3 times the other. These are:—Javanese: Bantam and Batavia with Javanese: Middle 
and East (4*2 times), Dayaks with Tagals (5*5), Hindus with Nepalese (6*0); Chinese: 
Koganei with Chinese: Peking (11*1) and Tagals with Nepalese (4*1). The first 
three of these pairs are almost as closely related as any pairs of Asiatic series (see 
Fig. 3), the two Chinese types are also similar to one another and the Tagals and 
Nepalese are not widely removed. The same discordance is found between the 
Aino and Dayaks (3*0) and Aino and Nepalese (5*8), and a comparison of the means 
shows at once that it is the large size of the Aino skull which distinguishes it 
in these cases more than the shapes of its parts. Examining the Andamanese 
coefficients in the same way (Table VIII) it is found that the value for all characters 
is greatly in excess of the other in the comparisons with Aetas (3*1), Chukchis (3*1); 
Javanese .* Bantam and Batavia (3*7); Chinese: Fukien (4 0), Dayaks (4*0), Aino (4*5); 
Javanese: Middle and East (10*4) and Burmese A (15*0). But it is the small size of 
the Andamanese skull which is its most distinguishing characteristic. The shape 
is very similar to those of the neighbouring Burmese and Javanese. The close 
resemblance between the shapes in these cases cannot be supposed fortuitous and 
it seems to point clearly to the fact that the Andamanese came originally from 
Java, or from the neighbouring mainland, and that they have degenerated since. 
This theory cannot be reconciled with the negrito hypothesis however. It may be 
noted that of the four most specialised Asiatic types with which we are dealing 
two are found on islands while the Chukchis and Tibetans B represent peculiarly 
segregated races inhabiting regions which are among the most difficult to reach 
in Asia. 

(4) Comparison of Single Measurements . When making comparisons between 
a considerable number of series, and at the same time making use of a considerable 
number of characters, as in the present case, the anthropometrician is accustomed 
to find that different measurements suggest incompatible schemes of relationship 
and there may be no sufficient reason why one of these should be trusted more 
than another. Use of the method of the coefficient of racial likeness frees him from 
this dilemma. It may be asked whether the classification suggested above can be 

Biometrika xxiv 9 



130 . Classification of Asiatic Races 

confirmed by a comparison of single measurements. The question whether a differ¬ 
ence between two means is significant or not can be answered at once from the 
value of a between them found in computing the coefficients. The 31 characters 
compared in this way need be the only ones considered now. An a is supposed to 
indicate a significant difference if it is greater than 10, and its value, when greater 
than this limit, will indicate the degree of significance. 

It will be sufficient to consider at the moment the evidence afforded by a 
comparison of single measurements within the group of 12 Oriental races which 
has been differentiated from other groups. It is found in this case, as is usual, that 
there arc enormous differences between the characters in the capability of each to 
distinguish the racial types. Only 9 out of the 66 possible comparisons can be made 
in the case of the capacity and no one of these is significant. The foraminal index 
is available for all the series, but no a is found greater than 10. When it is found, 
in the case of a particular measurement, that less than 40 per cent, of the possible 
comparisons indicate a significant difference, it may be assumed that the measure¬ 
ment in question will be of little value either in suggesting a new arrangement of 
the series, or in confirming the one provided by the coefficients of racial likeness. 
In the present case, 21 of the 31 characters need not be considered in detail for this 
reason. These are, in addition to the two mentioned above, and in order of the 
percentage of a s greater than 10: fml (1-5), fmb (1*5), Gi(r8), 100 6r 2 /C?i (2*2), 
AL( 10-7), J (13*6), N jL (14*3), G,(19-6), 0 2 (21*8), 100 0 2 /0x (22*6), S'(22 7), 
LB( 24-2), 5(28-8), 17(30-3), PZ(33-3), 100 G'iT/<?5 (33*3), 0 1 (340),'l0()^7Z(35*9) 
and H' (35*9). No as for these characters exceed 80. The means for the remaining 
ton are given in Table IX. For the cephalic and nasal indices and for the calvarial 
length more than half of the possible comparisons are found to indicate significant 
differences. In the upper part of the table the Southern Oriental series (see Fig. 3) 
are arranged in order of their cephalic indices and below them are the Chinese 
and Japanese in the same order. In spite of the clear distinction which was made 
between the two sub-groups, these distributions for the most variable single 
character are found to be overlapping. This condition is also found for all the 
other measurements except the nasal height and index—for which several means 
are missing—and for the upper facial index (100 G'H/GB). It is evident that the 
greater number of the clearly significant differences, and all the largest values of a, 
are found between comparisons of series not in the same sub-group. A direct 
comparison of the means might have suggested the division between the Chinese 
and Japanese, on the one hand, and the remaining Oriental series, on the other, 
but it would certainly not have led to any consistent scheme of relationship within 
each of these divisions such as those furnished by the lowest coefficients of racial 
likeness. 

It may be asked whether the same characters are also those most capable 
of distinguishing the races within the other two groups. For the Northern 
Mongolian series the measurements showing 40 per cent., or more, of significant 
differences are: i\TJ?'(46*7 per cent.), 100 B/L (400), Z(400), G'H( 40*0) and 



T. L. Woo and G. M. Morant 


131 


TABLE IX. 

Characters most capable of distinguishing Oriental series: means and a’s*. 



100 BjL 

100 NB/NH 1 

L 

NH' 

O'H 

NB 

8 

Oc. I. 

Q' 

100J3/H' 

Atit&s . 

84*0 

54-1 

171*0 

49*5 

69*6 

26*8 

360-0 

65*0 

323*7 

106*7 

Javanese: Bantam and Batavia 

83*0 

53*5 

169*9 

49*3 

69*8 

2fi-3 

353*3 

66*2 

313*9 

104*9 

Burmese A 

Javanese: Middle and East ... 

82*9 

— 

173-5 

_ 

71*4 

28*1 

363*7 

62*8 

325*8 

106*7 

82*0 

53*1 

173-7 

50*6 

70-8 

26*8 

360*4 

65*4 

31S-4 

105*1 

Tibetans A . 

79*2 

— 

175*7 

— 

OH-7 

25*7 

361*2 

61*8 

306*5 

106*1 

Dayaks . 

78*4 

54*5 

176*6 

50*2 

69*7 

27*2 

365*0 

63*2 

312-6 

102*9 

, Tagals . 

77*5 

55*5 

179*2 

50*0 

70*0 

27*7 

375*0 

62*6 

317*5 

101*2 

Chinese: Fukien . 

78*7 

48-1 

179*9 

52*6 

73*8 

25*2 

377*0 

___ 

61*8 

322*0 

102*3 

Chinese (Koganei) . 

78*0 

— 

180*1 

_ 

75*2 

25*0 

373*2 

_ 

325*2 

— 

Chinese: Peking . 

77*6 

— 

178*5 

— 

75*3 

25*0 

370*0 i 

61*1 

317*0 

100*5 

Japanese . 

77*5 

48*8 

180*5 

52*0 

71 *3 

25*3 

370*5 

— 

320*6 

— 

Chinese: Prehistoric. 

76*0 

— 

180*3 

— 

75*2 

25*8 

371*9 

61*9 

312*3 

100*8 

No. of a’s . 

66 

56t 

66 

571 

66 

66 

66 

45 

66 

64$ 

Per cent, of a’s >10 . 

57*6 

55*4 

i 

51*5 

49*1 

48*5 

48*5 

48*5 

42*2 

40*9 

40-6 


* AH the means in this table are based on 23 or more male skulls, 
t Including some comparisons between Frankfurt nasal heights ( NH) or indices. 

X Including somo comparisons between 100 BIH' s. 

Oi (40*0). The measurements most capable of differentiating the Indian races from 
one another are: 100 0 2 /0 1 (75*0), O' (60*0), G'H (60*0), NH' (42*9), 100 BjL (40*0), 
100 H'jL (40*0), If (40 0), and fml (40*0). There is little agreement between these 
two lists and that in Table IX, although the cephalic index and nasal and upper 
facial (O'H) heights are common to all three of them. 

While no detailed analysis along these lines is likely to prove profitable, it will 
be convenient to grade the characters roughly according to their capability of 
differentiating Asiatic types. A comparison of the percentages of significant a's 
suggests the following arrangement, the groups referred to being the three dealt 
with in Tables II—VII above: 

(a) Characters showing numerous marked intra- and inter-group differences 
throughout—100 B/L, O'H and NH'. 

(b) Characters showing numerous marked inter-group differences throughout 
and some marked intra-group differences—100 H'jL , 100 B/H', 100 NB/NH' 
and NB. 

(o) Characters showing numerous marked inter-group differences throughout 
and hardly any marked intra-group differences— B, J, U , H' and 0*. 

(d) Characters showing some marked differences in intra- or inter-gronp 

comparisons—X, If, O', Oc. 100 O 8 /0i,/mJ, 8, 100 G'HjGB^mb , PL, Oi and AL. 

(e) Characters showing few significant differences in any comparisons— NL, 
LB , 100/mi/M O, G 1} G% and 100 Gz/Gv 














132 


Classification of Asiatic Races 

There are more measurements in division ( d) than in any other and a comparison 
of these does not lead to any consistent arrangement of the types. Those in the 
first three divisions are the only ones likely to make any clear distinctions between 
the different groups. The ranges for these are given in Table X. If the Chinese 
and Japanese series are kept separate from the others in the Oriental group, as in 
this table, then there is no single measurement which makes absolute distinctions 
between all four groups. If all the Oriental series are grouped together, then B t J 
and 1001?/#' are the only characters which make absolute distinctions between 
the three groups. It is clear that the cephalic index considered alone may be a 

TABLE X. 


Ranges of Mean Measurements for different Groups of Asiatic Races (of. Figs. 2 and 3)*. 


Races 

100 B/L 

B 

J 

1001/7/, 

Northern Mongolian 
Chinese and Japanese 
Other Oriental 

Indian . 

81*4- 80*4(0) 
70*0- 78*7 (5) 
77*5 - 84-0 (7) 
72*0- 75*1 (5) 

148*7-151*5 (6) 
138*2-140*9 (5) 
138*2-143*7 (7) 
128*5- 132*6 (ft) 

139*8-144*0 (6) 
132*2-135*6 (5) 
131*0-134*7 (7) 
124*3-127*8 (5) 

70-0- 73-7 (6) 
76-0 - 77-0(3) 
74-7- 79-4 (7) 
73-4- 76-2 (5) 






Races 

mji/w 

IV 

(i'll 

Nil’ 

Northern Mongolian 
Chinese and Japanese 
Other Oriental 

Indian . 

113*0 - 118*0 (0) 
100*5 -102*3 (3) 
101*2-106*1 (7) 
96*1- 99*6 (5) 

127*6-131*1 (6) 
137*0-137*8 (3) 
130*9 — 137*2 (7) 
132*4-130*3(5) 

71*3- 77*6(6) 
71*3- 75*3(5) 
68*7- 71*4(7) 
61*3- 67*9(5) 

52*4 - 56*6 (6) 
52*0 - 52*6 (2) 
49*3- 50*6 (5) 
45*0- 49*4 (4) 






Races 

<h 

_1 

NB 

100 NUINIV 

U 

Northern Mongolian 
Chinese and Japanese 
Other Oriental 

Indian . 

34*1- 36*2 (6) 
33*8- 35*5 (4) 
33*3- 35*0 (7) 

| 31*2- 33*3(5) 

25*5 - 27*5(6) 
25*0 25*8(5) 

25*7- 28*1 (7) 
24*2- 25*7 (5) 

48*4- 51*4(6) 
48*1- 48*8 (2) 
53*1 - 55*5 (5) 
50*6- 52*9(4) 

516*1-524*9 (3) 
502*2-511*6 (5) 
491*8-508*3 (7) 
489*0 498*0 (4) 


* All means used in compiling this table are based on 16 or more crania. 


most unreliable guide to racial affinity. The ranges of the Northern Mongolian 
and Southern Oriental groups (excluding the Chinese and Japanese) overlap for 
this character, but the ranges for B , J, lOOi/'/A 300 7?///', NH\ 100 NB/NH, 
and U are discrete and quite widely separated. 

(5) Conclusions . The classification of Asiatic races presented in this paper has 
been reached by using purely quantitative methods. In its broad outline it agrees 
with most other classifications which have been suggested by anthropologists using 
less exact methods. There are 26 cranial series for which adequate measurements 





T. L. Woo and G. M. Morant 


133 


have been provided and a complete comparison was made between these by the 
method of the coefficient of racial likeness. The classification arrived at, shown in 
Figs. 2 and 3, was obtained by considering only the lowest orders of reduced coeffi¬ 
cients. It has been found from the inter-comparisons of other groups of allied types 
that the most consistent results are always reached in this way. The Asiatic series 
can be divided into a number of distinct groups. The first of these (see Fig. 2) 
includes all from Mongolia and Siberia, with the exception of the Chukchi series 
from the extreme north-east of the continent, and one of Aleutian crania is also 
included. There appears to be a remarkable racial uniformity among a number of 
peoples dispersed over an enormous area stretching from Astrakhan to the Aleu¬ 
tian Islands. The series belonging to this group resemble one another closely 
and they are all markedly dissimilar from any other series available. The unin¬ 
habited desert c ' Gobi and the mountains to the north of Tibet are known to have 
shut off these Northern Mongolian peoples from their nearest neighbours to the 
south, and the craniological evidence suggests that there has been little, if any, 
admixture with the Northorn Chinese. A second group of closely allied types is 
furnished by all the Chinese series available, including the Prehistoric series, and 
the Japanese. These show closer affinities to one another than to any races outside 
China and Japan, but no hard and fast line can be drawn between them and some 
Southern Oriental racial types. The Southern (Fukien) and Prehistoric Chinese 
and the Japanese are linked to the Tagals of the Philippine Islands and to the 
Dayaks of Borneo. The last two form a Southern Oriental group together with 
the Burmese, Javanese, Aetas—a so-called negrito people from the Philippine 
Islands—and the Tibetans of the A type who inhabit the south-west of their 
country. A fourth group is made up by all the Indian series available, and no sharp 
division can be made between the primitive Veddahs and Dravidians, on the one 
hand, and the Hindus and Nepalese, on the other. The contiguous Nepalese and 
Tibetan A peoples are racially connected, though by a bond which is far less 
intimate than that between the Nepalese and Hindus, or that between the Tibetans 
and Dayaks. In this way all the Indian and Oriental types can be connected up to 
form, as it were, a continuous system with no wide gaps at any point, but with 
three centres round which there is a cl(^ser clustering of the types. There are 
three of the 26 series which cannot be assigned to any of these groups. The Andama¬ 
nese conform to a type which is peculiar on account of its small size. But its shape 
is very similar to those of the Burmese and Javanese and it is reasonable to suppose 
that this indicates a true relationship, and that the stock degenerated after reaching 
the islands and there becoming isolated. An Aino series has only one close connec¬ 
tion which is with the Japanese. The latter thus occupies an intermediate position 
between the Southern Chinese, to which it is far more closely related, and the 
Aino, and this is possibly an example of racial crossing. The third peculiar type 
is found not in an island but in the inaccessible region lying at the extreme north¬ 
east of the continent. The Chukchis are markedly dissimilar to all the series of the 
Northern Mongolian group and they resemble most closely that of the Prehistoric 
Chinese. The fact that there are no connections with any of the Modem Chinese 



184 Clarification of Asiatic Maces 

types is noteworthy. The Tibetan B type is presumed to represent the population 
of the province of Khams in Eastern Tibet, and this is another inaccessible region 
on account of its mountainous character. There are only 15 male crania available 
and coefficients were calculated solely with the series which obviously resemble the 
Tibetan B series most closely. The only connection found of the order we are now 
considering is one with the Chukchis. 

This classification appears to us to be a reasonable one in eveiy way. There is 
clearly a close association between the geographical positions of the peoples 
compared and the degrees of racial affinity between them as measured by the 
reduced coefficients of racial likeness. The segregation of the Northern Mongolian 
types and of the isolated Andamanese, Aino, Chukchis and Tibetans of the B type 
was to be expected owing to the peculiar nature of the regions they occupy. The 
connection of the Indian peoples with those of the Orient through the Nepalese 
and Southern Tibetans, and the connection of the Northern Chinese with the races 
of Java and Burma through the Southern Chinese, Tagals and Dayaks, are in close 
accordance with geographical considerations and they are clearly not accidental. The 
arrangement we have been led to by using purely quantitative methods has some 
unexpected features and the most striking of these is the close association of the 
Aetas, who are generally classed as a negrito people having an entirely different 
origin from the non-negrito peoples of the Orient, with the Burmese and Javanese. 
We have also supposed that the Andamanese, who are also styled negrito, are closely 
allied to these two. If our deductions are correct then, as Dr von Bonin has 
suggested, the negrito hypothesis must be considered an entirely fallacious one in 
so far as it has been applied to Asiatic races. The evidence of the cranium is more 
likely to be a reliable guide in these matters than are characters such as stature 
and integumentary colours. The classification presented in this paper appears to be 
a suggestive one. The addition of further series may make it necessary to modify 
the details of the picture to a considerable extent, but we have every reason to hope 
that a reliable conception of the ethnic relationships and history of this and other 
groups of races will ultimately be reached by using statistical methods. 



A STUDY OF THE CRANIA IN THE VAULTED AMBU¬ 
LATORY OF SAINT LEONARD'S CHURCH, HYTHE. 

By BRENDA N. STOESSIQER, M.Sa and G. M. MORANT, D.Sc. 

(1) Introduction . The collection of skeletons at Hythe, representing the popu¬ 
lation of a single English town, is almost the largest one which can be examined 
to-day. Several anthropological descriptions of this material have been given at 
different times, but the only one which is of any permanent value whatever is that 
of Professor F. G. Parsons, published in 1908. Only a few measurements of 590 
crania are given in that study and they are clearly insufficient to lead to any reliable 
conclusions regarding the racial affinities of the inhabitants of Hythe in past times. 
The main object of the present paper is to present detailed measurements of a 
sample of 199 of the specimens selected from the total of about 1500. We are 
greatly indebted to the Rev. C. W. Chastel de Boinville, Vicar of Hythe, for 
granting us permission to undertake this work and for aiding us in other ways. 
We have also attempted to give a more comprehensive and accurate history of the 
human remains and a fuller account of the evidence bearing on the ethnic history 
of fche population they represent than any previously published. 

The technique used in taking the measurements, preparing the type contours 
and estimating the affinities of the racial types compared was precisely the same 
as that used in the study of the Spitalfields Crania*, and the reader is referred to 
that paper for particulars regarding these matters and for references to many of 
the sources of the comparative material. Some use was also made in the Spitalfields 
paper of our measurements of the Hythe skulls which are here published in detail. 

(2) The History of the Town and People of Hythe . There are two related 
historical questions relevant to the present inquiry which have to be considered. 
The first concerns the Church of St Leonard and its human remains directly, and 
the second the nature and racial constitution of the population of Hythe at different 
periods. Unfortunately the town has not yet found its historian and no authorita¬ 
tive ruling has been given on many of the points we have had to consider. The 
compilation of the material on which anything approaching a comprehensive history 
might be based has, indeed, yet to be made. It is clear that the characters of the 
early population of Hythe cannot be estimated without some reference to the 
surrounding district, and the map (Fig. 1) shows the region with which we shall 
be principally concerned. Almost due west of St Leonard’s Church, and at a 
distance of 2 miles, is West Hythe, now a hamlet with the ruins of the Church of 

* G. M. Morant, with assiatanoe from M. F. Hoadley: “A Study of the reoently excavated Spitalfields 
Crania.' 1 Biometrika , Vol. xxin. 1931, pp. 191—248. 



136 


The Hythe Crania 

St Mary the Virgin. Less than a mile beyond is Stutfall Castle with Lympne Castle 
and village to the north. Passing eastward from Hythe we come first to Seabrook, 
where a small stream which was once considerably larger now Aowb into the Royal 
Military Canal, then to Shorncliffe Camp, formed during the Napoleonic wars, and 
finally to Sandgate, 2| miles from our point of departure, where there is a castle built 
in Henry VIII’s reign. The line from Stutfall Castle to Sandgate rises above 100 ft. 
in places, though the old town of Hythe, built on the side of the hill, waB nearly 
all below the present 50 ft. contour line. This worn down undulating ground was 



Fig. 1. 


formerly the escarpment of Lower Greensand deposits and it is known to have 
been higher, and to have had a steeper slope in parts, within historical times. 
There are records of landslides—called earthquakes by some of the earlier writers— 
which have affected it and Stutfall Castle was injured by a subsidence at some 
unknown date. The Greensand ridge comes down to the sea at Sandgate, but to 
the west it is separated from the present-day coast-line by a plain which begins at 
Seabrook, increases in width to the west and finally merges into Romney Marsh. 
The northern edge of this area, which is nowhere more than 20 ft. above sea-level, 
is marked roughly by the line of the Royal Military Canal. The old town of 
Hythe was situated half a mile, and Stutfall Castle is lj miles from the present 







B. N. Stoessigkr and G. M. Morant 


137 


shore. The coastal plain, like the marshes to the west of it, is marked on geological 
maps as alluvium, but, although few borings have been taken, there can be no 
doubt that it was formed by the combined action of river and sea. Its outline is 
known to have been modified to a large extent during historical times owing to 
both human and natural agencies, and these changes have had a profound effect on 
the population of the district at different periods. Several incompatible explana¬ 
tions of the process which resulted in the formation of the eastern end of Romney 
Marsh have been suggested *. The Rhee wall from Appledore to New Romney 
was constructed in pre-Roman or Roman times, coins and other antiquities of the 
occupation having been found in many parts of Romney Marsh which it protected. 
An early theory, championed by Holloway f, was that the principal outflow of the 
marsh was the river Limen at this time, and it is supposed to have had a wide 
estuary immediately below the hills on which Lympne stands, while there was a 
Roman port at thau place. The sea is thought to have receded from Lympne at 
the beginning of the seventh century and this led to the rise of West Hythe, at 
which place the river still debouched. Holloway supposes that by the end of the 
eighth century the Limen had become divided into two branches, one still flowing 
out at West Hythe and the other—the Rother—at Romney. It is conjectured 
that the bed of the Limen dried up some time during the next two or three 
hundred years, but traces of it were said to be still visible when Holloway wrote. 
According to another early theory, championed and elaborated by Lewin, there 
was no river or branch of the sea below Lympne Hill in Roman times, for water 
there would have been at a higher level than the marsh. He remarks that: 
“ Roman remains are scattered over the whole of Romney Marsh, and may be 
found in every field that is ploughed J.” A great shingle spit is supposed to have 
reached all along the border of this area from Lydd to Sandgate with one break 
between Lydd and Romney and another at Hythe. Behind the latter there was 
a narrow gut extending from West Hythe to Shorncliffe. The harbour formed in 
this way was kept scoured by three streams, of which one was the Seabrook, which 


* The question is dealt with in the following sources: William Somner, A Treatise of the Roman 
Ports and Forts in Kent , 1693, pp. 41—61. William Holloway, The History of Romney Marsh from its 
earliest Formation to 1837 , 1849. Charles Roach Smith, Report on Excavations made on the Site of the 
Roman Castrum at Lymne , in Kent , in 1850 , 1862, pp. 39—45. Thomas Lewin, The Invasion of Britain 
by Julius Caesar , 1859; The Invasion of Britain by Julius Caesar with Replies to the Remarks of the 
Astronomer Royal and of the late Camden Professor of Ancient History at Oxford , 1862; “ On the Position 
of the Portus Lemanis of the Romans,” Archaeologia , Vol. xl. 1866, pp. 861 — 874. More recently 
Dr T. Rice Holmes has diBOUssed at considerable length the numerous theories which have been held 
regarding the changes Romney Marsh has undergone in the past 2000 years [Ancient Britain and the 
Invasions of Julius Caesar , 1907, pp. 632—552, 622—625 and 640—641). He giveB numerons references 
and the geological evidence is considered. It is curious that the most complete inquiries regarding these 
matters have been made with the object of substantiating some theory or other—and often one which 
is really of minor historical importance—relating to the Roman occupation. Lewin attempted to prove 
that Julius Caesar landed near Hythe, and Bice Holmes only deals with the district in order to refute 
this view. Holloway endeavoured to show that Anderida was at Newenden, beyond the west end of the 
marsh, and not at Pevensey. 

t Op. cit., pp. 14—28, 47—49, 62—56, 62, 65 and 102. 

X Loc. cit. t 1866, p. 866. 



138 


The Hythe Crania 

flowed into it from the north. These were unable to perform the work effectively 
and there was a gradual diminution in the size of the Hythe Haven*. The theory 
that there was a Roman port below Stutfall Castle is said to be wholly untenable. 
The castrum there was built to defend the marsh and the port was not there, or at 
West Hythe, but at Hythe. After an examination of a great deal of evidence, 
part of which had not been accessible to Lewin, Rice Holmes came to the following 
conclusions: “ first, that the Rother did not, in the time of Caesar, enter the sea 
at Lympne...; secondly, that the marsh was then closed at West Hythe Oaks 
(half way between Hythe and West Hythe), and therefore that there was no 
harbour at Lympne; thirdly, that the Rhee Wall had not then been built.*..; 
fourthly, that the Portus Lemanis was a pool harbour extending from West Hythe 
to a point nearly opposite Shorncliffe...+.” There is thus a substantial agreement 
between the views of Lewin and Rice Holmes with regard these matters. 
Rather different conclusions have been reached by a number of other writers. 
Everyone admits that a branch of the sea came close to the hills along part of the 
line from Hythe to Stutfall Castle in Roman times, and it has been generally held 
that this estuary reached West Hythe and that the port was there, although 
there may have been no river running along the north of the marsh. 

There is a certain amount of historical and archaeological evidence relating to 
Roman times which bears directly on this district. Stutfall Castle is the Saxon 
name of a building which is entirely Roman. The following sources have been 
supposed to refer to this castrum, or to the immediate neighbourhoodJ: the 
Geography of Ptolemy refers to the “ New Port ” or “ New Haven ” which has been 
identified with Lympne, though this view has been contested; the Antonine 
Itinerary , assigned to the second or third century, gives the distance from London 
to the “ Portus Lemanis 7 ' and between other towns on the way, treating it as one 
of the three Kentish ports, the other two being Rich borough and Dover; the 
Notitia Dignitatum , compiled at the beginning of the fifth century, refers to 
“ Lemanis ” as the place where an officer of a detachment of the Tumacenses held 
a garrison under the command of the Count of the Saxon Shore; the Peutinger 
TaMe } assigned to the latter part of the fourth century, mentions the same place 
with the symbol of a gateway between towers signifying a fortified city or port; 

* Home of the sixteenth and seventeenth century maps of Kent show the town of Hythe with 
St Leonard's Ohurch above, a few houses below it, a stream running down on either side of these 
buildings and the two meeting in a pool, with one or two islands in it, which was situated between the 
town and the sea. According to Lewin (op. cit. , 1862, p. lviii) the position, of the mouth of this pool 
could be clearly seen when he wrote, the hollow having been filled in a few years before. 

t Op. cit. t p. 652. 

X In addition to the souroes already oited, the foUowing are the more important ones dealing with 
the district in Boman times: Charles Boaoh Smith, The Antiquities of Richborough, Reculver and Lyrtme , 
in Kent” 1860. William Henry Black, “On the Identification of the Boman Portus Lemanis, ” Archaeo - 
logia , Vol. xl. 1866, pp. 875—880. George E. Fox, “The Boman Coast Fortresses of Kent,” The 
Archaeological Journal, Vol. Lin. 1896, pp. 862—876. B. F. Jessup, The Archaeology of Kent,” 1980. 
Francis Hobson Appaoh, Cains Julius Caesar's British Expeditions from Boulogne to the Bay of Apuldore , 
and the subsequent Formation geologically of Romney Marsh” 1868. Appach attempted to prove that 
Bomney Marsh was not formed until the middle of the fifth century a.d. 



B* N. Stoessigbr and G. M. Morant 


139 


the anonymous Geographer of Ravenna mentions “ Lemanis.” All later writers 
agree in identifying the fort of Lemanis with Stutfall Castle. The Portus Lemanis 
is only mentioned clearly in the Antonine Itinerary . Somner thought that it was 
at Romney*, but all later authorities reject this view and place it either below 
Stutfall Castle, or at Hythe, or somewhere between these two places. The theory 
that it was at West Hythe appears to be the most plausible. The Roman road 
from Canterbury, known as Stone Street, is still well defined for over ten miles 
and it runs straight towards Shipway Cross (see Fig. 1) and the modern hamlet of 
West Hythe. The first excavation of Stutfall Castle was undertaken by Roach 
Smith in 1850 and it was very incomplete owing to the lack of sufficient funds. 
The site is a large one covering more than ten acres and most of the walls above 
ground had been removed at an early date and some of the materials were used in 
building Lympne Church and Castle. Landslides have distorted the remains. The 
chief entrance was found to be on the eastern side. The only building of interest 
discovered inside the boundary walls was one which had been provided with hypo- 
causts and two fire-places. It has been suggested that this was either an officer’s 
house or the baths of the station. Tiles which had been previously used in 
constructing another building and an altar forming part of the foundations of a 
gateway bore inscriptions which are interpreted as Classiarii Britannid, or British 
marines. The altar was covered with barnacles, proving that it had been washed 
by the sea at one time. The age of the fortress could not be placed earlier than 
Constantine, owing to structural details, and it was hence one of the last Roman 
stations built along the south coast. It was also one of the largest and most 
important. At some date, probably before Constantine, a division of the British 
fleet must have been stationed in the vicinity, and later the large castrum was 
built and occupied at one time by Turnacensians (from Tournai). In 1894 further 
excavations on the site of the castle were carried out by Professor (later Sir Victor) 
Horsley f. The absence of the south wall had been taken to indicate that there 
was a river or sea defence on that side, but vestiges of the southern wall were 
found at this time, proving that the castrum was of the usual quadriform pattern. 
Numerous other Roman remains have been found in the locality with which we 
are concerned. Leland says in his account of Stutfall: “About this Castel, yn tyme 
of mind, were found Autiquites of mony of the RomaynesJ.” The numerous 
remains found in the adjacent marsh have been referred to (p. 137 above) and 
there were potteries at Dymchurch less than three miles away. Roach Smifch§ 
says that there was no record in 1850 of Roman remains having been found in the 
vicinity of the castrum, and its burial-place had not been discovered, but he refers 
to the discovery of a building of this period less than two miles away to the north¬ 
west. In attempting to trace the course of the Roman road from the villas he 

# Op. cit ., pp. 37—62. 

t The only account of these is a paragraph reporting a lecture in The Athenaeum , 22nd September, 
1894, p. 394. 

% Thomas Hearne’s edition of The Itinerary of John Leland the Antiquary , Second edition, Vol. vii. 
1744, p. 182. 

§ Op. eit., 1860, pp. 262—264. 



140 


The Hythe Crania 

discovered at Folkestone in 1924 to Lympne, S. E. Winbolt mentions a number of 
finds in the locality* * * § ; these are: “a coin of Marcus Aurelius found (1904) 18 in. 
down in Hill Crest Road, Hythe; Roman burials found in the quarry at the corner 
of Hill Crest Road and Castle Road (both of these at the top of the slope above 
North Road); Roman remains found in Harp Wood in 1874...and Roman remains 
in the glebe land of Lympne Vicarage.” It is inferred that the old road went 
along the line of North Road immediately above St Leonard's Church. The corner 
of Hill Crest Road and Castle Road is less than 200 yards from the church. Harp 
Wood is nearly a mile away to the north-west f. There is another record of Roman 
remains having been found in Hythe. In his book published in 1862 Lewin says 
that: “ in excavating for a drain at the east end of Hythe, we came to the founda¬ 
tions of a Roman building in the main-road, about two feet under the surface, and 
turned up at the same time a great quantity of broken Roman potteryJ.” The main 
road runs below St Leonard’s Church. This writer attempted to prove that Caesar 
landed at Hythe and that the subsequent battle with the Britons took place “in 
the field to the south and east” of the town. We are told that “on the triangular 
level there, human bones, and unquestionably of men slain in battle, are brought 
to light. They are exclusively the bones of grown men, buried only a few feet 
below the surface, and without any care, in all conceivable positions. I do not 
affirm that these are the remains of the Britons who fell in the conflict with 
Caesar, for they may be the bones of either Saxons or Danes who afterwards 
landed at the same place... Mr Elliott is cited as having supplied this interesting 
information||. The land in question is said to be distinct from the great shingle 
bed and below high-water mark. It would appear to have been under the water 
of the gut which Lewin thought to be the port in Roman times. He recognises 
this and observes that it was certainly dry at low water. The evidence reviewed 
above seems to indicate conclusively that there was a considerable population at 
Hythe during part, at least, of the occupation. Stutfall Castle must have been an 
important station with a large garrison towards the end of that period, and it is 

* Homan Folkestone, 1925, pp. 158—160. 

t The site near Harp Wood is marked on the 6-inch Ordnance map. 

X Op. cii. t 1862, p. exxi. 

§ Ibid. , pp. lxxiii—lxxiv. 

|| The accuracy of Lewin as a reporter or observer is certainly not above suspicion. On p. 92 of the 
seoond edition of hiH book he says that the sea flowed below Stutfall Castle in anoient times up to the 
very base of the hill, “as is proved incontestably by the fragments of ships and anchors which have 
been dug up....** Rico Holmes (op. cit. } p. 622) points out that on pp. lxviii and lxix of the same book 
it is denied that the sea was ever below Stutfall Castle in Roman times and the evidence of the frag¬ 
ments of ships and anchors is entirely ignored. Lewin’s lack of consistency is, in fact, far more 
thoroughgoing than this, however. In his later paper in Archaeologia (pp. 864—865) we are told that, 
if tho Portus Lemanis had been at the foot of Lympne Hill, “we should expect to find at least some 
vestiges, however faint, of the port itself. The ground there has been long under cultivation, but I have 
never heard or read (though 1 have often inquired) that any remnant of a pier or sunken vessel, or even 
any anohor or other part of a ship’s tackle, was ever discovered in this part. Again, had the port existed 
here, the adjacent parts on the hill side must have been oovered with wharves and warehouses and the 
dweUings of the seafaring population; but, with the exception of Stutfall itself, no signs of population 
here show themselves.” 



B- N. Stoessigrr and G. M. Morant 


141 


probable that in the earlier centuries, at least, there was one of the most important 
ports of the country situated somewhere between the castle and the modern town. 
Mercenaries from Flanders are known to have been quartered in the castrum at 
one time, but there is no other indication of the racial constitution of the population 
during this epoch. 

On turning to the history of the district in Saxon times we are at once 
confronted with another problem which has aroused controversy among historians. 
Hasted gives the following account of the matter: 

“ During their contests, in the year 456, a bloody battle was fought near this 
place, between Folkestone and Hythe, between the Britons under K. Vortimer, 
and the Saxons, who were retreating hither before him, after the conflict he had 
with them on the banks of the Darent, in the western part of this country. 
According to soils writers, this battle was not fought near Folkestone, but in 
Thanet; but as the Britons drove the Saxons, after the battle, into that island, 
the place of conflict could not be there. Nennius and others write, that it was 
fought in a field on the shore of the Gallic sea, where stood the lapis tituli , which 
Camden, Usher, and Baxter, caught by the sound of the name, take to be Stonar, 
in that island; but Somner, Gale and Stillingfleet, instead of that, read, in a 
correction of their own conjecture, lapis populi , or Folkestone. This place certainly 
suits best with the description of it, on the shore of the Gallic sea; and what adds 
strength to this, are the two vast heaps of sculls and human bones, piled up in two 
vaults under the churches of Folkestone and Hythe, which, from the quantity of 
them, could not but be from some battle; and, from their whiteness, appear to 
have been all bleached by lying for some time probably on the sea shore; and 
many of the sculls have deep cuts in them, as made by some heavy weapon. 
Probably those at Hythe were of the Britons, and those at Folkestone of the 
Saxons, who were pursued hither by them 

Hasted was not the originator of this theory which is supposed to account for 
the existence of the bones at St Leonard s, though he does not acknowledge the 
fact. The authority of this most voluminous, though perhaps not most accurate, 
historian of Kent was very generally accepted without question for many years 
after he wrote and, as he offers only one explanation with regard to this matter, 
his tale—though really absurd—was often quoted as if it stood for the final verdict 
of posterity. We are asked to believe that a battle took place somewhere between 
Folkestone and Hythe and that there were some thousands on each side slain. 
Some time shortly afterwards, presumably, the corpses were sorted out, all the 
Britons being collected in one place and all the Saxons in another. They were 
then left until the bones had been bleached and at some time at least 700 years 
later the skeletons of the Britons were transported to Hythe and those of the 
Saxons to Folkestone, and it should be remembered that these two towns are seven 
miles apart. If any denial of such a preposterous theory is needed it is found in 

* Edward Hasted: The History and Topographical Survey of the County of Kent , Vol. in. 1790, 
p. 878, footnote («). A similar account is given on p. 420 of the same volume and on p. xxviii of Vol. i. 
1778. 



142 


The Hythe Crania 


the fact that there are considerable numbers of women and children represented 
by the bones at Hythe. The only documentary evidence which could suggest that 
a battle was fought in the district at this time is a short passage in the Historia 
Brittonum , a composite work of which parts were written as late as the middle of 
the ninth century*; The lapis tituli on the shore of the Gallic sea is not identified 
with Folkestone by modern scholars. As far as wo have been able to ascertain, the 
first writer to suggest that the bones at Hythe might possibly be associated with 
the fifth century battle was John Harris f. His conjectures regarding the matter 
are far more sober than those of Hasted. The origin of the remains is said to be 
unknown, but two guesses may be ventured on; the first is that they came from 
graveyards in the town and the second that they were ‘‘collected and piled up 
here on some eminent occasion.” Of the two “ eminent occasions ” suggested one is 
the battle between the Britons and Saxons. The bones may be those of the soldiers 
of the two armies, “ whose bodies fell herabouts and at Folkestone ” and the 
existence of another ossuary there renders this supposition more probable{. This 
supposition may be dismissed as being entirely untenable§. 

The following inscription was hanging in the ambulatory passage of St Leonards 
at the beginning of the nineteenth century: 

“From an antient History of England brought down to 1658. 

A.D. 853. The Danes landed on the coast of Kent, near the town of HytM (now 

Hythe)_ They were...at length defeated by Gustavus, the governoiwrf Kent, who 

assembled the greatest part of the inhabitants, assisted by the army of Ethelwolf, 
then king of Britain, who met the invaders near Hyta, when the Danes...being 
overpowered fled to their vessels, then on the coast near the above town; but 
being closely pursued, they made a bold stand near the water, where the battle 
became general, and tradition reports that upwards of 30,000 fell in the conflict. 
After the battle, the Britons...returned to their homes, leaving the slain on the 
field of battle; where being exposed to the different changes of the weather, after 
a length of time the flesh rotted from the bones, which were at length collected 
and piled in heaps by the inhabitants, who in time removed them into a vault in 
one of the churches of Hyta. D. Thomson, A.D. 1797 H.*’ 

* H. Monro Chadwick: The Origin of the English Nation t 1924, p. 36. 

f The History of Kent, Vol. i. (the only volume printed) 1719, p. 152. 

X Harris says that he had been told that in digging a grave at Folkestone Churoh a vault waB found 
** where great quantities of bones, like these (at Hythe), were piled up.” Mention of this vault was made 
by Thomas Philipott ( Villare Cantianum , 1659, p. 96). Local interest in ossuaries was likely to 
be lively and there are several references to visitors who attempted to looate this one, and to see its 
contents, in the eighteenth and nineteenth centuries, but without success. S. J. Maokie (A Descriptive 
and Historical Account of Folkestone and its Neighbourhood , 1856, p. 105) says that theciypt containing 
bones was found “a few years since” under the north chancel. 

§ In describing the final stages of the conquest of the Jutes in Kent, John Bichard Green ( The 
Making of England , 1881, pp. 39—40) supposes that the last years of Hengest were occupied in reducing 
the fortresses on the southern coast and that of Lympne is mentioned aB being the last to fall. There 
appears to be not a shred of documentary or archaeological evidence, however, to show that Stutfall 
Castle was ever occupied, or defended, in the second half of the fifth century. 

|| From a letter in The Gentleman's Magazine , Vol. 72, 1802, pp. 1001—1008. 



B. N. Stoessiger and G. M. Morant 


143 


The earliest reference to this inscription we have been able to find is one made 
by Charles Seymour in his survey of the county published in 1776 *, so the one 
above was probably copied from an earlier original. Another copy with slightly 
different wording was made in 1812 and this was framed and may be seen hanging 
in the ambulatory to-day. The date of the battle is given as 848 and the signature 
“D. Thomson, a.d. 1797” is omittedf. Several other writers have referred to, or 
given transcripts of, this record and the dates 842 and even 143 have been given 
in error in some of these accounts. D. Thomson was probably the transcriber of 
the original account. We have not been able to identify the “antient History of 
England (or ‘Britain* in the 1812 copy) brought down to 1658,** or to find any 
author who has ‘suggested that a battle with the Danes was fought at or near 
Hythe in the middle of the ninth century. The invaders were defeated by 
Ethelwulf, leading the West Saxons, at Aclear (Ockley) and it is just possible 
that this is the event referred to. The Anglo-Saxon Chronicle gives the years 
851 and 853 in different MSS. and the former is now generally regarded as being 
the authentic one. We can only conjecture that the authority of a particularly 
imaginative ancient historian led to a wildly improbable theory when the bones 
were associated with this particular battle. The theory was generally accepted 
until recent years. 

During the second Danish invasion, and after the death of Ethelwulf, a con¬ 
siderable force was landed at the mouth of the Limen. One writer has suggested 
that this vtvs at Hythe, but his theory has not been generally accepted, even by 
those who believe that there was a river estuary below the town in post-Roman 
times. Two chronicles mention the Portus Lemanis in this connection, but the 
Anglo-Saxon Chronicle Btates that the landing in 893 was at the mouth of the 
Limen at AppledoreJ. Two charters of 732 and 833 refer in almost identical terms 
to a piece of land bounded on the south by the Limen and having on the north the 
“Hudan Floet.”J The last has been identified with West Hythe, but this is merely 
conjectural. According to an early account which has been repeatedly copied, the 
manor of Hythe was given by King Alfred § to the Priory of Christ Church, Canterbury, 
in 849, but this appears to be another spurious tale. This transference was made, 

* A New Topographical , Historical , and Commercial Survey of the Cities , Towns, and Villages of the 
County of Kent, p. 477. 

f S. J. Maokie tells us that the memorial—probably meaning a copy of the original—was “ written 
in a fine hand by the favourite pupil of a local pedagogue” (op. cit ., p. 180). He also refers to "Roman 
and Saxon pottery and mediaeval ooarse earthenware ” found in restacking the pile of bones at Hythe 
and in his possession when he wrote. Thomas Wright ( Wanderings of an Antiquary , 1854, p. 120) also 
refers to the pieces of pottery found, * 1 some of which are of a very early character, and appear to me 
like fragments of Anglo-Saxon burial urns. Among them were some fragments of glazed mediaeval 
pottery of a later period—probably of the sixteenth century....” 

% See references and discussions by Rioe Holmes (op. cit., pp. 539—542) and Holloway (op. cit., 

pp. 18—20). 

$ Alfred waB born in 849. Kilburne (1659) appears to have been the first to mention this grant and 
it has been referred to—without references, as usual—by many later writers. Hueffer (The Cinque 
Ports, 1900, p. 191) gives the date as 889, whioh is more reasonable, but he gives no authority for the 
statement. 



144 


The Hythe Crania 


however, in 1036 by Halden, or Halfden, a Saxon thane * * * § There appears to be only 
one other undisputed reference to the town before the Conquest. Two MSS. of the 
Anglo-Saxon Chronicle give an account of the revolt of Earl Godwin in 1052. He 
sailed with Harold from the Isle of Wight and visited Pevensey, Romney, Hythe 
(Hide or Hythe), Folkestone, Dover and Sandwich and “even took all the ships that 
they found, which might be of any value, and hostages as they went, and then betook 
themselves to London... f.” All the towns, except Hastings, which later became 
the Cinque Ports are mentioned here together with Pevensey and Folkestone and 
it is probable that all the more important Kentish ports of the time were visited. 

Few Anglo-Saxon remains have been discovered in the neighbourhood of Hythe. 
A disused quarry to the north-west of the town is marked on the Ordnance map as 
the site of finds of this period in 1870, but no other record of the event appears to 
have been made J. Burial places have been excavated at Folkestone, and near Lyinpne. 
Romney Marsh was probably uninhabited during this period, though the east and 
north of Kent were more densely populated by the invaders than any other part 
of England§. The origin of the town of Hythe, as of most of like antiquity, is 
obscure, but it is at least evident that it was a port of some importance before the 
Norman Conquest. 

Further evidence of the early importance of the town is furnished by the 
Domesday Book ; the following are the references to Hythe: 

“Hugh de Montford holds, of the Archbishop^Sajteodo.. ..To this Manor pertain 
two hundred and twenty-five Burgesses in the borough of Heda.” t*. 

“The same Archbishop holds Leminges in demesne....Thereto pertain six 
burgesses in HedelT.” 

The aggregate value of the Borough of Hythe and Manor of Saltwood was 
estimated at sixteen pounds in the reign of Edward the Confessor and eight when 
it was transferred. At the time the survey was made, its total produce was said to 
be twenty-nine pounds six shillings and four-pence. It is generally recognised 
to-day that no reliable estimate of population can be deduced from the Domesday 
Book Most of the particulars relating to the towns were probably only entered 
because of their bearing on the fiscal rights of the Crown. We find that Dover, 
Sandwich and Romney are mentioned as providing sea service, but Hythe and 
Hastings are not. In the account of Kent, reference is made only to 816 burgesses, 

* Hasted: op. cit ., Vol. m. p. 141. Dugdale: Monasticon Anglicanum , 1682, Vol. i. p. 21. This is 
the earliest reference to Hythe in Saxon times, mentioned by Hasted. The Rev. H. D. Dale, sometime 
Vicar of Hythe, says that it is the first undisputed reference to the town (“Notes on Hythe Church,” 
Archaeologia Cantiana , Vol. xxx. 1914, p. 275). 

t Thorpe’s edition, 1861, Vol. ii. p. 153. 

% See the artiole on 4 ‘Anglo-Saxon Remains” by Reginald Smith in The Victoria History of the 
Counties of England, Kent , Vol. i. 1908. 

§ A map showing the distribution of Anglo-Saxon burial places for the whole of England has been 
given by E. Thurlow Leeds: The Archaeology of the Anglo-Saxon Settlements , 1913, p. 19. 

IT Rev. Lambert Blackwell Larking: The Domesday Book of Kent. With Translation , Notes and 
Appendix , 1867, pp. 103 and 105. 




B. N. Stoessigee and G. M. Moeant 145 

438 of these belonging to Canterbury, 231 to Hythe and 135 to Romney. But the 
county survey opens with a long description of Dover which does not detail a single 
burgess in the town, and there may have been similar omissions for other places. 
In Bpite of the meagre notice of Hythe in this record, it is reasonable to infer that 
the town was ohe of the largest and most important in Kent in the eleventh century. 

0 

Further evidence that Hythe was an important centre before the Conquest is 
furnished by the fact that it was one of the original Cinque Ports*, together with 
Sandwich, Dover, Romney and Hastings, and all these were amongst the most 
flourishing and populous English towns in mediaeval times. The origin of what was 
once the most important corporation in England is obscure owing to the loss of the 
earliest charters in the sixteenth and seventeenth centuries. The earliest extant 
is that of the sixth year of Edward I (1278) and it refers to earlier ones granted 
by Edward the Confessor, William I and other kings. The Ports grew in importance 
and they were most flourishing under Edward I, who fostered them. Disasters 
from which they never recovered began in Edward Ill’s reign, but until the end of 
the fifteenth century the corporation had to furnish the Crown with nearly all 
the ships and men needed for the State. The primary cause of their ultimate 
decline was the silting up of their harbours, and Romney and Hythe became affected 
before Sandwich and Dover. The "Two Ancient Towns of Winchelsea and Rye” 
became members with the same status as the original five; and other subordinate 
members, or limbs, were attached at various times to the seven principal Ports. 
Most of these affiliated groups were made up by five, or more, small towns, but 
West Hythe was the only one ever attached to Hythe, and it was a non-corporate 
member within the municipal jurisdiction of the larger town. The services 
demanded from the Ports in return for the immunities and privileges they enjoyed 
are known for some years and they show little variation during the earlier centuries. 
In 1229 a total of 57 ships had to be provided, each manned by 21 men and a boy. 
This number of ships was assembled from the Ports at several later dates, but the 
crew in each had to be increased to 24 and later to 34. The following list, showing 
the proportions of the total fleet which were contributed by Hythe in different 
years, is compiled from figures given by Jeake and Burrows: 


Year 

Total number of ships provided by 
th&Cinque Forte ana their members 

Number provided 
by Hythe 

1229 

67 

5 

1294 

50 

3 

1300 

30 

*4 

1347 

106 (?) 

6 

1360 

67 

5 


* The following ere the most important sources of information relating to the Oinque Ports; Samuel 
Jeake, Charters of the Cinque Porte , Two Ancient Towns and their Members. Translated into English , 
with Annotations Historical and Critical thereupon . This book was written in 1678 and published 
posthumously in 1798. It is still by far the most important source of first-hand information on the 
subject. Sir Nicholas Harris Nicolas, History of the Navy from the Earliest Times to 1422, 1847,2 volfl. 
Montagu Burrows, Cinque Ports , 1888. This is by far the most authoritative modern work on the Ports, 

Biometrika xxiv 10 



146 


The Hythe Crania 

The total number in the fleet at the Siege of Calais in 1347 was 710 ships and 
14,151 men, so the Ports were only able to supply a proportion of the war-time 
strength needed at this time. The Cinque Ports had supplied only a proportion, 
too, of the 200 English vessels which took part in the Battle of Sluys (1340). In 
the following year the shipping demanded from Hythe and Romney was not 
ready and their franchises were ignored for a time. In 1344 an exceptional demand 
was made and all the Ports received, or were threatened with, a similar treatment. 
About 1412 Hythe suffered from a series of disasters and the 11 services” of the 
town were remitted for the next five occasions on which the Ports might be 
summoned. 

The Court of Shepway, or Shipway, was the chief law-court of the corporation. 
This “portmote,” or parliament, made by-laws for the Cinque Ports as a whole and 
it acted as a Court of Appeal. It was held originally, and for some centuries, at 
Shipway Cross which is north of West Hythe and less than a mile from Stutfall 
Castle. Before 1597 the installations of all the Lord Wardens took place here. The 
meetings are believed to have been held in the open, like all early Teutonic 
assemblies. Ordinary business was conducted by the courts of brotherhood (“brod- 
hall ”) and guestling which generally sat at Dymchurch or Romney. The fact that 
Shipway Cross should have been chosen as the meeting-place of the most important 
gatherings is clear evidence of the importance of the Hythe district in early times. 

According to local tradition, Hythe was ravaged by the French seven times*. 
There appears to be only one historical record of these incursions, however. Hythe 
was an unwalled town and its churches were probably the only buildings which 
could offer substantial protection f, but, as far as is known, it suffered far less from 
pillage than did Winchelsea and Rye which were well defended. In 1295 a French 
fleet of 300 ships, drawn principally from Mediterranean ports, sailed up the Channel. 
One vessel got ahead of the others while exploring the land and it grounded near 
Hythe. After enticing the crew a short way inland, the townsmen turned on them and 
killed them to a man. There were 240 foreigners slain and their ship was burnt J. 
The remainder of the fleet withdrew, but it returned later and burnt Dover before 
being finally repulsed. The earliest account of the bones in St Leonard’s Church 
mentions a tradition that they are those of Frenchmen slain on the coast, and it is 

but it was written for the general reader and there is not a single exact reference in the whole volume. 
It is to be regretted that Professor Burrows did not incorporate the results of his researches in a more 
technical form. Ford Madox Hueffer, The Cinque Ports , 1900. This is described by the writer in the 
prefaoe as “a piece of literature pure and simple” and, though presenting some new facts, it is far from 
aoourate in many details. There seems to be a real need for a comprehensive history of one of the most 
important English institutions of mediaeval times. 

* This statement is made by Hueffer {op. cit. t p. 193). The writer makes another statement for 
which we have not been able to find any confirmatory evidence. He says (p. 192) that for the whole 
duration of the cult of St ThomaB, Hythe was a principal port of entry for foreign pUgrims. St Thomas 
was believed to have declared oracularly that Hythe was the safest port for those sailing to Boulogne. 
The town was ecclesiastical property at this time. 

f In the town records for 1412 mention is made of carrying guns from the bridge to the Ghuroh on 
sleds ( Fourth Report of the Royal Commission on Historical Manuscripts , 1874, p. 434). 

t The story is given by Henry Knighton (De Eventibus Angliae) who wrote in the following century. 



B. N. Stoessigbr and G. M. Morant 


147 


probable that this occasion is referred to. A later account refers to it directly and 
suggests that the collection was started in this way, though there must have been 
additions to it later (see pp. 155—157 below). 

The primary cause of the decline of Hythe was undoubtedly the silting-up of 
its harbour. There is a considerable amount of evidence bearing on this matter and 
it has been interpreted in different ways. The theories relating to the nature of 
the coast in Roman times have been considered above and the most plausible one 
postulates the existence of a pool harbour extending from a point below West Hythe 
to Seabrook. There were centres of population at this time at West Hythe and 
where the present town of Hythe stands, which would have been the middle of the 
harbour. From the fourteenth century onwards many attempts were made to 
enlarge and prevent the further choking of this pool # . Leland visited the town 
sometime between x535 and 1543, and he gives the following account of it: 

“Hithe hath bene a very great towne yn lenght, and conteyned iiii paroches that 
now be clene destroied, that is to say S. Nicolas paroche, our Lady paroche, 
S. Michael's paroche, and our Lady of Westhithe, the which is with yn lesse than 
half a myle of Lymme Hille. And yt may be well supposed that after the haven of 
Lymme, and the great old town ther fayled, that Hithe strayte therby encresed and 
was yn price. Finally to cownt fro Westhyve to the place wher the subs tans of the 
towne ys now ys ii good rnyles yn lenght, al along on the shore to the which the se 
cam ful sumtyme, but now by bankinge of woose and great casting up of shyngel 
the se ys sumtyme a quarter, sumtyme dim . a myle fro the old shore. In the tyme 
of King Edward the 2, there was burned by casuelte xviii score howses and mo, 
and strayt folowed great peBtilens, and thes ii thinges minished the town. There 
remayne yet the ruines of the chyrches and chyrch yardes. It evidently apereth 
that wher the paroch chirch is now was sumtyme a fayr abbay. Yn the quire be 
fayre and many pylers of marble, and under the quier a very fair vaute, also a faire 
old dore of stone, by the which the religius folkes cam yn at mydnight. In the top 
of the chirch yard is a fayr spring, and therby ruines of howses of office of the 
abbey; and not far of was an hospital of a gentilman infected with lepre....The 
havyn is a prety rode, and liith meatly strayt for passage owt of Boleyn (Boulogne). 
Yt croketh yn so by the shore a long, and is so bakked fro the mayn se with casting 
of shinggil, that smaul shippes may cum up a larg myle toward Folkestan as yn a 
sure gutf.” 

“From Hithe to Holde Hithe, alias West Hithe, about 2 myles, Mastar Twyne 
saythe that this was the towne that was bumid alonge on the shore, where the 
ruines of the churche yet remayne 

* References to this matter are given in the 1Fourth Report of the Royal Commission on Historical 
Manuscripts , 1874, pp. 429—489. A selection of the town reoorda is printed in this volume; the 
majority of the others have not been published. They relate principally to the fifteenth century and 
are by far the moBt important collection illustrating the history of Hythe as a Cinque Port and its 
municipal and general condition at that time. 

t The Itinerary of John Leland in or about the Years 1585 — 1548. Edited by Lucy Toulmin Smith, 
1909, pp. 64—65. 

t Ibid., p. 46. 


10—2 



148 


The Hythe Crania 

Writing in 1570, Lambarde declares that he could find nothing to account for 
the early greatness of Hythe. He concludes that: u either the place was at the first of 
little price and for the increase thereof endowed with privileges or (if it had been 
at any time estimable) that it continued not long in that plight*” The earliest 
port is said to have been at Lympne, but the sea deserting that part it moved to 
West Hythe until it, too, became land-locked and Hythe took its place: “which 
now standeth indeed, but yet without any great benefit of the sea, forasmuch as at 
this day the water floweth not to the town by half a mile or more.” 

Writing shortly after Lambarde, Camden in 1586 gives a similar account of the 
matter. “Nor,” he writes, “is it very long since its (Hythe’s) first rise, dating it 
from the decay of West Hythe, which is a little town hard by to the west, and was 
a haven, till, in the memory of our grandfathers, the sea drew off from it; but both 
Hythe and West Hythe owe their origin to Lime, a little village adjoining, formerly 
a most famous port before it was shut up with the sands that were cast in by 
the seaf.” 

The date when the harbour ceased to be used can be determined with some 
certainty. In the Jurats Account Book for 1412 there are frequent references to a 
“Hollandyr” who was evidently employed in order that he might survey and 
preserve the harbour. According to a survey made in 1566, there were: “creeks 
and landing-places 2; th’on called the Haven, within the liberties; th'other called 
the Stade, without the liberties. It had of shipping, 17 tramellers of five tunne, 
seven shoters of 15, three crayers of 30, four crayers of 40; persons belonging to 
these crayers and other boats, for the most part occupied in fishing, 160}.” The 
creek must have become too small to be navigated by any but the smallest vessels 
shortly after this date as the contemporary maps suggest. Some attempts to keep 
it open were still being made more than a century later, however. Dr Wallis wrote 
in 1701: “At Hythe in Kent (which is one of the Cinque Ports) there was (in our 
fathers time) a convenient harbour for small vessels; which is now swarved up. 
Several attempts have been made to recover the harbour, but with small success§.” 

The question when the port moved from Lympne to West Hythe and from West 
Hythe to Hythe, if this transference ever did take place, is obviously one which 
cannot be answered so decisively. Camden says that the second of these movements 
took place not “ very long since.” Having, apparently, no better evidence to go on 
than the three sixteenth century accounts quoted above, Hasted concluded that 
West Hythe was once part of Hythe itself and that the .two were connected by a 
number of straggling houses. He also adopts another suggestion made by some 

* William Lambarde: A Perambulation of Kent: conteining the Deecription t Hyetorie t and Cuetomes 
of that Shyre , collected and written (for the mo»t part) in the yeare 1570 , 1576, p. 159. • 
f William Camden: Britannica Antique. 

X This passage is contained in a survey of the maritime parts of Kent made in the eighth year of 
Elizabeth’s reign. It is quoted by Hasted (op. cit. t Vol. in. pp. 412—413) from the Dering MSS, 

g “Of the Isthmus between Dover and Calais,” Phil. Trane. No. 275. The “stade” was used for 
beaching boats after 1700, as at Hastings, but there was nothing which could possibly be called a 
harbour by then. 



B. N. Stobssiger and G. M. Mobant 


140 


earlier writers and states as a fact that West Hythe was the original Cinque Port 
These ideas have been adopted, or elaborated, by many of Hasteds successors: some 
of them suppose that West Hythe was the original Port and most write confidently 
of a town two miles in length *. These conjectures appear to be quite fanciful. There 
were once at least five churches in the locality and all these were probably built in 
early Norman times. Three, including St Leonard’s which was almost certainly the 
largest, were situated within the boundaries of modern Hythe, one was mid-way 
between Hythe and West Hythe and the last, which was certainly small, was at West 
Hythe. Leland writes,“Holde Hithe, alias West Hitheand “East Hithe” was occa¬ 
sionally used by other writers, but the general custom from earliest times was to refer 
to these places by the names which they bear to-day. Lewin says: “I cannot find any 
authentic record that West Hythe was ever anything more than a suburb of Hythe f.” 
This appears to us x> be the only safe conclusion to draw from the evidence, though 
we should not use the word suburb as it is extremely unlikely that the two were 
ever connected by buildings. It is practically certain that West Hythe was not the 
original Cinque Port. With its favourable position in the middle of the harbour, 
the present site of the town, where there is known to have been a Roman settle¬ 
ment, may well have become the most important centre in the neighbourhood from 
the time at the end of the occupation when Stutfali Castle was deserted. 

The lords of Hythe in mediaeval times were the monks of Christ Church and the 
Archbishops of Canterbury. Professor Burrows remarks that this town and Romney 
were the nautical outlets “for a body of settlers scattered over a considerable space... 
who were for the most part tenants of the monks endowed by the Kings of Kent, 
rather than the centres of a fishing and trading people gathered round the ports which 
gave access to the interior of the island and linked it with the Continent J.” One would 
expect to find in such a town that the number of ecclesiastical foundations was out 
of proportion to the size of the population. Leland says that a “fayr abbay” once 
stood where the parish church (St Leonard’s) is now, but nothing whatever is known 
about this and it is extremely probable that it never existed. The ruins in the 
churchyard may have been those of other buildings, but there is no other record 
referring to them. The sites of the churches of the “iiii paroches that now be clene 
destroied” have all been identified and there may have been another one§. Two—or 
three, if the doubtful one is included—are practically within the modem limits of 
the town and hence close to St Leonard’s. Another is midway between Hythe and 
West Hythe and the last is at West Hythe. This last church is in ruins to-day and it is 
the only one of which there are any remains still above ground. It had fallen into disuse 
before Leland’s time and all the others were in ruins by 1400. The church at JVest 

* That the town was once of such a size is the generally accepted view to-day. It was adopted by 
Professor Burrows (op. cit. , p. 84). 

t Loc. cit. , 1866, p. 866. 

t Op. cit. t pp. 44—45. 

§ Rev. G. M. Livett: “West Hythe Church and the Sites of the Churches formerly existing at 
Hythe,” Archaeologia C anti ana , Vol. xxx. 1914, pp. 251—262. Several references to the demolished 
ohurohes will be found in the records printed in the Fourth Report of the Royal Commit non on Historical 
Manuscripts . The sites of two of these in Hythe are shown on the 6-in eh Ordnance map. 



150 


The Hythe Crania 

Hythe is known to have been small and it is probable that St Leonard's was always 
the largest and most important in the district. There is said to have been no burial- 
ground at West Hythe; one of the disused churches in Hythe (St Nicholas) was 
surrounded by a graveyard from which bones have been taken in recent years and 
there seems to be no sufficient evidence to show whether the others had grounds of 
their own, or not. It is not known when the leper Hospital of St John was founded, 
but it was in existence before 1336. In 1562 it contained eight beds. In 1336 
Hamo, Bishop of Rochester and a native of Hythe, founded the Hospital of St 
Andrew for ten poor persons. This was transferred to another site in 1342 and it 
became known as the Hospital of St Bartholomew. Thirteen poor persons were 
housed in it then and there is believed to have been a chapel, and possibly a burial- 
ground attached. The two hospitals are used to-day as almshopses. 

The Church of St Leonard was founded in early Norman times and it was enlarged 
considerably in late Norman times and again in the thirteenth century. It is one of 
the finest parish churches in England with an inside length, excluding the tower, 
of 120 ft.; it has north and south transepts. The chancel is particularly large and 
its floor is raised to an unusual extent above that of the nave. It was built in the 
early thirteenth century and this plan is believed to have been adopted in order that 
a vaulted passage might be formed under the east end of the church to provide an 
ambulatory or processional way outside it. The new chancel extended right up to the 
road on its east front and the only possible method of constructing a path on conse¬ 
crated ground, allowing passage from the south to the north side outside the building, 
was the one adopted. In his description of St Leonard’s Church*, Canon Scott 
Robertson gives other examples of churches having exterior processional ways which 
modified the construction of the building. He also cites various orders for pro¬ 
cessionals issued by the Crown in the fifteenth century and other records relating 
to the matter. The ambulatory of St Leonard’s has no connection with the enclosed 
space below the rest of the chancel and there are no windows in it. The church is 
built on the side of a hill rising to the north and there is a built-up path along its 
south exterior bounded by a wall which falls to the road below. The floor of the 
ambulatory is several feet above the level of this road, but it is below the level of 
the ground on the north side of the church. Earth had accumulated there right 
up to the apex of the doorway until it was cleared away in recent times. Being 
really above the level of the ground, this chamber, which became the ossuary, 
cannot properly be called a crypt. Dealing with the question of when the use for 
which it was built no longer existed, Canon Scott Robertson concludes: "As the 
Procession way would be in constant use untif the Reformation was fully established, 
I feel confident that nothing would have been allowed to obstruct free passage 
through it with cross erect in solemn procession before the Refonnation. Con¬ 
sequently I believe that the large collection of human skulls and bones now stored 

* *‘St Leonard's Churoh, Hythe," Archaeologia Cantiana , Vol. xviii. 1889, pp. 408—420. An 
interesting account of the church and town is given by the Bev. Herbert D. Dale, sometime Vicar, in: 
St LeonardChurch , Hythe , from its Foundation with some Account of the Life and Customs of the 
Town of Hythe from ancient Sources , 1981. 



B. N. Stoessiger and G. M. Morant 


151 


there could not have been placed within this Procession Path until after the 
Reformation in the sixteenth century Professor Parsons suggests that the 
ambulatory was used as such after it had become customary to place bones in itf. 
The maximum width of the vault is eleven feet and it is clear that a pile as large 
as the one that is there to-day would have been a real obstruction. No great width 
appears to have been needed for the Procession, however, as the south porch of the 
church was built about a hundred years later than the chancel, and the east and 
west doors in it which originally gave access to the Path were made quite small. 
Professor Parsons also points out that pottery and other relics of pre-Reformation 
date were found near the bottom of the pile when it was re-stacked in 1908. This 
evidence is of little value, however, since mediaeval pottery had been found when 
the bones were re-gtacked on earlier occasions and there is no record of whether 
the potsherds wen in or beneath the pile. The real objection to the theory that 
no human remains were placed in the vault until after the Reformation is the fact 
that this would allow no more than a century for the accumulation of parts of over 
4000 skeletons, this number having been ascertained in 1908 by counting the 
femora present. If the use of the vault for ritual purposes is supposed to have been 
discontinued about 1550, then in little more than 100 years afterwards (see p. 155 
below) the stack of bones must have been nearly completed and it became the regular 
custom to show them to visitors. The yearly deaths in the town for this century are 
known to have been between 30 and 40. It is unlikely that the grave-diggers would 
disturb and preserve part of one skeleton every time they dug a grave and it was 
certainly only on rare occasions that they placed bones of children in the vault, 
although they must have dug them up quite frequently. On Canon Scott Robert- 
sons hypothesis, it is only possible to reconcile these numbers by supposing that at 
some particular date between 1550 and 1650 there were at least 1000 skeletons 
exhumed at one time. These may have come from the graveyard of St Leonard's, 
or from that of one of the demolished churches, but there is no record of such an 
event and we have no reason to believe that any of the disused graveyards were 
built on during this period. 

The original area of the burial-ground of St Leonard's Church was probably 
quite small. After the chancel was built, it must have extended principally from 
the north side of the church, partly from the west also, but not at all from the 
south or east fronts as there were roadways along these two sides. The oldest 
tombstone found to-day in the area mentioned was erected in 1649, but the 
majority there date between 1780 and 1850. After the last of these dates the area 
was apparently extended to the north and west, and about 1880 a piece of land was 
added to the east of the church. Interments now take place in a ground removed 
from it. The area of the graveyard before 1700 was probably less than one acre. 
The Early Norman church was extended 27 ft. to the east when it was enlarged in 
Late Norman times and there was a further extension in the same direction, and 
to the same extent, when the present chancel and ambulatory were built in the early 

* Loc. cit., pp. 406—408. 

t 11 An explanation of the Hythe Bones,” Archaeologia C anti ana, Vol. xxx. 1914, pp. 208—218. 



152 


The Hythe Crania 

thirteenth century. It is extremely probable that the ground built on at these times 
formed part of the original graveyard, and not at all unlikely that the bones were 
disinterred then and that they were placed in the ambulatory as soon as it had been 
constructed. This was about 150 years after the foundation of the church, and it 
cannot be supposed, of course, that any number approaching 4000 skeletons can 
have been found then in a part only of one graveyard while other grounds were in 
use in the town. It appears to us most probable that a nucleus of the collection 
was formed in this way and that additions were made to it by grave-diggers during 
the following centuries. According to this hypothesis the passage must have 
been used at the same time both as an ambulatory and as an ossuary. The 
alternative one which supposes that no bones were placed there until the Reforma¬ 
tion cannot be disproved and the possibility that part, at least^of the remains were 
taken from the other graveyards or from plague-pits must be admitted. The 
evidence is not sufficient to decide such questions definitely*. 

The church registers have not been published. The word “Plague” is written 
against the entries for the years 1597 and 1623. The following totals refer to 
complete decades: 


Years 

Burials 

Marriages 

1587—1596 

298 

84 

1597—1606 

309 

119 

1613—1622 

433 

n 

1623—1632 

434 

106 

1633—1642 

361 

61 


During the Commonwealth there were hardly any entries in the registers. The 
survey made in 1566 referred to above gives the number of inhabited houses as 
122, the number of persons lacking habitation as 10 and the total engaged in 
fishing 160. There are two maps of Hythe and its immediate surroundings drawn 
in 1684 and 1685 respectively by Thomas Hill, “sworne Surveyor in Canterbury.” 
The first was prepared with the object of showing the land belonging to St John’s 
Hospital and the part showing the town is reproduced in Fig. 2}. There appear to 
be about 260 separate buildings and the area covered is approximately the same as 
that occupied by Hythe to-day if the buildings of the School of Musketry and all 
south of the Royal Military Canal are left out of account. The 1685 map shows 
the town on a small scale and there are far fewer buildings: it shows the lands 
belonging to the two hospitals of St John and St Bartholomew. We are indebted 
to the Governors of these institutions for permission to examine both maps and to 
reproduce part of the earlier one. The surveyor was not concerned principally with 
the town, and the number of houses shown on the 1684 map may be inexact as 
some other particulars certainly are. The census of 1801 gives a total of 212 

* [If a charnel-house existed in the graveyard as in so many English graveyards before the Refor¬ 
mation, its contents, including possibly the bones disinterred when the ambulatory passage was built, 
may have been transferred there after the Reformation, because either the charnel-house was dilapidated, 
or its site was required for other purposes. Ed.] 

t There were 76 marriages in the eight years 1615—1622. 

t A reproduction of the complete map is given in Archaeologia Cantiana , Vol. xxx. 1914. 




Fig. 2. A map of Hythe drawn in 1684 by Thomas Hill and reproduced by permission of the Governors of the Hospitals of St John and St Bartholomew, 
The site of the church and churchyard of St Nicholas is shown to the west of the town near the spot marked H. 


154 


The Hythe Crania 

inhabited houses in the parish of St Leonard and a population of 1365. In 1800 
there were 43 burials, and at the beginning of the eighteenth century there were 
generally fewer than 20 burials a year. The evidence suggests that between 
Elizabethan times and 1800 there were certain fluctuations in the size of the town, 
but it was probably larger at the later date than at any other time during the 
period. It is extremely probable that the population was rather larger than this in 
the earlier centuries, but it is not clear that it was many times larger as some 
writers have implied. The evidence of the Domesday survey has been considered. 
According to Leland there were 360 houses burnt down in the reign of Edward II. 
No confirmation of this statement can be found and it is extremely unlikely that 
the catastrophe was at West Hythe as Master Twyne stated. Events which had 
a profound effect on the town, and which are well documented, took place about a 
centuiy later. Writing in 1570, Lambarde gives an account of a conflagration which 
destroyed 200 houses, a pestilence and a calamity at sea which resulted in the loss 
of five ships and 100 men, all occurring about the same time in the reign of 
Henry IV * The town was so badly hit that the survivors decided to abandon it, 
which they would have done if the King had not granted them a charter releasing 
them from services on the next five occasions on which the Cinque Ports should 
be summoned. Lambarde says that he had seen this charter. It is among the 
archives of the Corporation of Hythe referred to by H. T. Riley f and it is dated in 
the second year of the reign of Henry V (1414). The catastrophes are proved by 
other documents to have taken place in the time of Henry IV and all other 
particulars related by Lambarde appear to have been correctly reported. Records 
of the early fifteenth century represent the town as being in a state of utter filth 
and disrepair and, owing to its declining harbour, it appears to have had insufficient 
vitality to recover the important position it must have held in the thirteenth and 
fourteenth centuries. 

We should certainly not expect to find that there were many foreigners in the 
town at the time when there was a long-continued internecine feud between the 
Cinque Ports and the French fleets. Shortly after this, however, a Hundred paper 
(for 1422) records a Jurors’ report that there were many Frenchmen in the town 
who had not taken the oath to the KingJ. The Patent and Close Rolls of earlier 
years refer to orders which were given at different times to encourage, or obstruct, 
foreign traders. It is known that considerable numbers of foreigners settled in 
Kent in the reign of Edward III (1327—1377), but these were of Flemish or 
Walloon origin and none of them would have been called Frenchmen at this time. 
The influx of Huguenot refugees did not commence until nearly two hundred years 
later, and large numbers of them are known to have settled in the Kentish ports, 
and in Winchelsea and Rye in particular. Hythe does not seem to have received 
any of them, however. According to a census taken in 1622 there were only three 

* Op. at ., pp. 141—148. 

t Fourth Report of the Royal Commission on Historical Manuscripts , 1874, pp. 429—489. 

t This is quoted on p. 482 of the Report of the Royal Commission cited above. 



B. N. Stoessiger and G. M. Morant 


155 


foreigners in the town and these came from Flanders*. Few foreign names occur 
in the town records, or, it is said, in the parish registers, and there is no evidence 
that the nature of the population of Hythe was modified at all by admixture with 
aliens any time after the Norman Conquest. 

We may now consider the earliest direct references to the bones and the various 
hypotheses which have been advanced to account for their origin. Leland (1535— 
1543)f gives a description of the town and he mentions the “very fair vaute” 
under the choir of the parish church, but nothing is said about its contents. 
Lambarde (1570), Camden (1586) and Kilburne (1659) do not mention the ossuary. 
It is particularly unfortunate that Hythe should have been omitted entirely from 
Thomas Philipott's Villare Cantianum (1659), and it is not known why this 
striking omission occurs. The work is supposed to have been compiled principally 
from materials collected by John Philipott—the father of the author and a native 
of Folkestone—before 1640. The discovery of the crypt below Folkestone Church 
is referred to (see p. 142, and second footnote there). What appears to be the earliest 
description of the skeletons is one given by Jeake in his annotations to the Charters 
of the Cinque PortsJ. These had been written by 1678 although not published until 
1728. The following is the complete account: 

“On the north (sic) side of the church is a charnel-house, or Golgotha , full of 
dead mens bones, piled up together orderly, so great a quantity as I never saw 
elsewhere in one place; supposed by some to be gathered at the shore, after a 
great sea fight and slaughter of the French and English on that coast, whose 
carcases, or their bones after consumption of the flesh, might be cast up there, and 
so gathered and reserved for a memorandum.” 

Someone before Jeake had associated the skeletons with a skirmish between the 
inhabitants of the town and a party of Frenchmen, and the bones must have been 
seen by several people some time before he wrote. There is no suggestion that any 
additions were being made to the orderly pile of bones at this date. Jeake was 
Town Clerk of Rye and he was certainly well acquainted with local affairs. James 
Brome, our next authority, was Rector of Cheriton, near Folkestone, from 1676 to 
1719, and he was also Chaplain to the Cinque Ports and Yicar of Newington, which 
is 2J miles north-east of Hythe. His book was published in 1700, but the materials 
for it had been collected some years earlier. After a short description of the town, 
Brome writes: 

“But which now more especially preserves still the fame, and keeps up the 
repute of this poor languishing port...is the charnel-house adjoining to the church, 
or the arched vault under it, wherein are orderly piled up a great stack of dead 
men’s bones and skulls, which appear very white and solid, but how or by what 

* Lists of Foreign Protestants and Aliens resident in England , 1618 — 1688, from Returns in the State 
Paper Office, Camden Society, 1862, p. 14. 

t The dates given here are the latest years, or'periods, in which the several writers can have visited 
Hythe before writing their accounts of it, as accurately as these dates can be determined. References 
have been given above unless otherwise stated. 

t Op . cit., p. 109. 



150 


The Hythe Crania 

means they were brought to this place the townsmen are altogether ignorant, and 
can give no account of the matter....” The writer supposes that they are probably 
those of the Frenchmen killed in 1295 and “after this slaughter these men's bones 
in all probability might be gathered up and laid there after which daily accessions 
of more might be made till they encreased to so vast a number as is still visible*.” 

At some time before 1700, and probably from about 1650, it had evidently 
become the custom to show the bones to visitors and this has been continued 
without intermission until the present day. No later account of the town has 
omitted to comment on its most spectacular attraction. It has been suggested that 
the earlier topographers fail to mention the bones because similar collections could 
then be seen at many other churches. It is known that there were ossuaries in the 
parish churches of Folkestone, Dover and Upchurch in Kent, and as late as 1751 
corpses were being deposited in crypts in the same countyf. It is not known, 
however, that it was customary to show these collections to visitors, and it is clear 
that this cannot have been the common practice at all the churches which possessed 
ossuaries. The evidence suggests forcibly that a large pile of bones had been 
gathered together in the ambulatory of St Leonards by the middle of the seven¬ 
teenth century and that about this time they were shown regularly. It is unlikely 
that there were many later additions to the pile. There is an obvious reason 
why the bones here should be shown while the ossuaries of other churches were 
being bricked up and forgotten. Unlike them it is entirely above ground and, 
owing to its position, it is adequately lighted and ventilated. 

The next writer of any importance who discusses the origin of the skeletons is 
Dr John Harris whose History of Kent was published posthumously in 1719. He 
says that it had been a “long and very common enquiry, how and on what occasion 
they came there.” Three conjectures are offered: the first is that they are the 
bones of people buried in the grounds of the four ruined churches, and the fact 
that similar collections are found in other churches lends support to this view; 
the second is that they are the remains of the Frenchmen slaughtered in the 
town in 1295, and the third that they are the remains of the Saxons and 
Britons who fell in the last battle which Vortimer had with the Saxons. The 
writer is inclined to think that the last is the true explanation, and the fact 
that another ossuary had been discovered at Folkestone Church is supposed to 
render it more probable. The account given by Charles Seymour in 1776 need 
only be mentioned because it appears to contain the first printed reference to an 
inscription hanging in the vault attributing the remains to the Danes who are 
supposed to have been slain in the neighbourhood. It is not known who originated 

* Three years travels over England , Scotland and Wales , p. 270. 

f An Act of Parliament (25 George II o. 11) was passed in that year: “To enable the parishioners 
of East Greenwich to deposit Corpses in the Vaults or Arches under the Church of the said Parish and 
to ascertain the fees that they shall pay for the same/’ 

Other ossuaries exist, or at one time existed, at Bothwell, Waltham Abbey and Manchester and 
Bipon Cathedrals, and there were doubtless many others. The coffin containing the remains of John 
Hunter was once housed with many others in the crypt of the church of St Martin-in*the-Fields and 
bones not in coffins have been found there. 



B. N. Stoessiger and G. M. Morant 157 

this theory, or when the inscription was first exhibited. Hasted (1790) rejects it 
and adopts a variation of one of Harris’s suggestions. The battle in 456 was fought 
somewhere between Hythe and Folkestone and, after the bones of the slain had 
remained for a time exposed on the sea shore until they became white, those of 
the Britons were removed to Hythe and those of the Saxons t*o Folkestone. The 
possibility that the bones might have been derived from graveyards is not even 
considered. One or other of the above theories has been accepted by all later 
writers on the subject. 

Anthropologists are accustomed to finding that historical evidence has been 
perverted in order to establish a connection between a collection of human skeletons 
and a battle. In the present case there are two battles involved. They were both 
fought some centuries before the collection of bones could have been made and it 
is extremely unlike <y that the field of either was anywhere close to Hythe. The 
authority of Hasted and the spurious evidence of the inscription, first hung up in 
the ambulatory about the middle of the eighteenth century, and of which an old copy 
may still be found there, have been responsible for perpetuating obvious errors. The 
suggestion that the Frenchmen known to have been slain in the locality in 1295 
are represented is a far more plausible one, but there are said to have been fewer 
than 300 of these, and such numbers are generally exaggerated, while the imperfect 
remains of over 4000 skeletons have been found in the ambulatory. Brome 
appreciated this difficulty and he supposed that the original pile was increased by the 
later addition of other bones which came, presumably, from graveyards. The real 
objections to this view are, firstly, that such an assemblage would have shown a 
greater disproportion between the sexes than the one actually found and, secondly, 
that it would have been racially less homogeneous than it actually is. By far the 
most plausible hypothesis in the light of all the evidence appears to be that all the 
bones were taken from graveyards and it is probable that the majority of them, 
at least, came from the ground of St Leonard s Church. The date when the collection 
was started cannot be ascertained with certainty, but it may have been when the 
ambulatory was built in the early thirteenth century. There were probably few, 
if any, additions to it after the time about the middle of the seventeenth century 
when the ambulatory passage was first shown to visitors. There is little historical 
evidence bearing on the ethnic history of the inhabitants of Hythe. The district 
must have been almost as thickly populated as any in Kent in Roman times by 
marines, auxiliaries and traders. It is not known that there was any large Jutish 
settlement in the neighbourhood, but immediately before the Norman Conquest tho 
town was again a relatively important one. The origin of its mediaeval population 
may be disclosed by a study of the physical characters of the people themselves. 

(3) Anthropological Descriptions of the Hythe Skeletons . The earliest account 
of the Hythe skeletons provided by anyone capable of examining them from an 
anthropological point of view appears, to be a short one published in 1834* by 

* Physiognomy founded on Physiology t and applied to various Countries , Professions and Individuals: 
with an Appendix on the Bones at Hythe....The Sculls of the ancient Inhabitants of Britain and its 
Invaders. The appendix is pp. 280—280. 



158 


The Hythe Crania 

Alexander Walker, an Edinburgh anatomist and physiologist of repute in his day. 
He remarks that no rational account of the remains had been given by an anthro¬ 
pologist before his time. The contention that the skeletons deposited in the 
ambulatory were those of the soldiers slain in 456 in the battle between the Britons 
under Vortimer and the Saxons is accepted and the alternative one associating 
them with the Danish invaders of the tenth century is discarded as being less 
probable. The skulls are not those of one race, however, as had been supposed. 
“Two forms of scull, very distinct from each other, predominated:—one a long 
narrow scull, greatly resembling the Celtic of the present day;—the other, a short 
broad scull greatly resembling the Gothic....These were mixed with others of less 
definite character, in general so varied as to fall under no such obvious classification.” 
The mean maximum lengths and breadths of unstated numbers of the Celtic 
(Ancient British) and Gothic (Saxon or Danish) types are given and there are 
drawings in no'rma veriicalis of contrasted specimens. The report that red hair 
had been found led to a search, since this is “a striking characteristic of the Gothic 
nations.” Walker writes: “ 1 had not proceeded far, when I found several sculls 
with masses of red hair matted upon them. . ..In every instance, these were the short 
and broad Gothic sculls ; and nothing of the kind could be discerned on the British!” 
In addition to the two principal types of cranium, another kind was found which 
was round-headed, but stronger, heavier and more capacious than either of the 
preceding kinds. These larger skulls, however, are “universally in a more imperfect 
state ” than the others and they must be ascribed to Romans who had been killed 
on the same battle-field in a previous age. The evidence of their superior develop¬ 
ment “ made one cease to be surprised that the Roman was easily the conqueror 
and the master of the other races around him.” 

The next account of the Hythe skeletons worthy of notice is that of another 
Edinburgh anatomist, Robert Knox, who was a better known anthropologist than 
his predecessor in this field. He contributed two papers to the Transactions of the 
Ethnological Society in the early sixties*. The first gives a general description of 
the bones and of sundry anomalous specimens found. A few skulls of children were 
the only ones not attributed to adult men and it is stated explicitly that no female 
crania were observed. The type is said to resemble that of the present inhabitants 
of South Britain with “ few varieties, and none peculiar or different from what we 
now find.” It was concluded that anomalous conditions, such as inter-parietal bones 
and fronto-temporal articulation, occurred less frequently than in modern times. 
The various theories that had been proposed at different times to account for 
the contents of the ossuary are discussed, and Brome’s attributing them to the 
French invaders massacred in 1295 is accepted. Hence they belonged “to a mixed 
people composed of several races, amalgamated for the time into a nation, and 

* “ Some Observations on a Collection of Human Crania and other Human Bones at present 
preserved in the Crypt of a Church at Hythe in Kent,” Transactions of the Ethnological Society of 
London , New Series, Vol. x. 1861, pp. 238—245. “Some additional Observations on a Collection of 
Human Crania and other Human Bones, at present preserved in a Crypt of a Church at Hythe, in Kent,” 
Ibid., Vol. n. 1868, pp. 136—140. 



B. N. Stoessiger and G. M. Morant 


159 


strictly analogous to the inhabitants of Kent at that period.” The suggestion that 
the collection is merely the remains of churchyard bones collected promiscuously 
is said to be untenable, though no arguments against it are brought forward. 
Knox says that the distinction made by Walker between round and long heads is 
not perceptible. He speaks of fragments of " Roman Saxon ” pottery and of coarse 
mediaeval earthenware found a short time before in re-stacking a portion of the 
pile of bones and then in his possession. The second communication of this writer 
followed a re-examination of the remains. The churchyard theory is said to be 
untenable, owing to the absence of female specimens, and also because the condition 
of the bones is unlike that of the skeletons of interred bodies, no signs of decay 
being observed. We are told that: “ the pile as it now stands, and the crania on 
the shelves were thus arranged some seventeen years ago by the son of General 
Frieze, who took a fancy to trouble himself with this labour. Prior to this, the 
entire mass lay heaped up in utter confusion upon the floor of the crypt....” A few 
useless measurements are given and it is remarked that the great deficiency of the 
type is in its frontal breadth. The view that the bones are most probably those of 
men who fell in battle is still regarded as the most probable, though the date of 
this battle has yet to be determined. The theory that the remains of Frenchmen 
killed at the end of the thirteenth century had been preserved is thus given up by 
this writer. 

At some date in, or before, 1865 Frank (Francis T.) Buckland paid a visit to 
Hythe and he has a short article on the bones in a book published in that year*. 
He does not claim to have made any special inquiries regarding the history or 
anthropological nature of the material. He apparently saw no reason to question the 
statement ascribing them to the slain in the battle in Ethelwulf s reign. This was 
presumed to have taken place on flat ground between the old town and the sea, 
and we are told that: “ The House of the present mayor of Hythe is built upon 
part of this battle-field, and in digging the foundation of it many bones were 
discovered, whence a name is now given to this house, more expressive than 
classical, viz. ‘ Marrow-Bone Hall \” This was possibly on the site of the fourth 
church, or churchyard, in the town of which the exact location is not known. The 
red hair and the distinctions between the skulls of Ancient Britons and Danes 
were pointed out to visitors in Buckland’s time. Regarding the arrangement of 
the remains he says: “ Mr Tournay, builder and clerk of the church, informed me 
that the bones used to lie scattered in disorder till about twenty years ago, when 
they were arranged in their present decent order.” 

Barnard Davis’s catalogue of his own collection of skullsf, which is now in the 
Museum of the Royal College of Surgeons, gives measurements of six specimens 
from the Hythe ossuary. One was purchased at the sale of Mr Heaviside’s Anatomical 
Museum in 1829, and it is said to have been obtained “by a lady, under peculiar 

* Curiosities of Natural History . Fourth Series. The article is entitled: “Ancestral Skulls.” 

t Thesaurus Craniorum , 1867. Measurements and descriptions of the Hythe skulls are given on 
pp. 42 and 44—47. 



160 


The Hythe Crania 

circumstances we are not told how the others were procured. Their owner had 
never seen the ambulatory and its contents. Of the few skulls he had examined he 
says: “ the condition alone of these crania, if the evidence of form were wanting and the 
period of the foundation of the church itself were not conclusive, will wholly exclude 
the four first people named (viz. the Britons, Romans, Saxons and Danes) from any 
participation in their ownership....That they could have been exposed to the air 
and other deteriorating influences, even in this closed crypt, ever since the days of 
the Royal Antiquary (Leland), now 330 years, and maintain their present appear¬ 
ance is quite impossible. They are undoubtedly of more recont origin, and present 
the conformation of the modem men of Kent.” They are said to resemble the 
skulls of modern Germans. The measurements in inches and tenths are of little 
value. Using the ophryo-occipital length and the maximum calvarial breadth, the 
cephalic indices found are 77*0, 81*9, 82*4, 851, 87'5 and 88T and the mean index 
is 83*7 *. Two of the six specimens are supposed female, and an unnamed medical 
correspondent had told Barnard Davis that both sexes were represented in the 
original collection. The divergencies in this and other respects between the account 
of the Hythe skulls given here and the earlier ones of Walker and Knox are 
striking enough. 

A short account of the skeletons was given by the Rev. T. G. Hall, Vicar 
of Hythe in 1889f. The conjectures of Walker and Knox are referred to and 
the bones are supposed to be those of men slain in a battle. This paper is 
cited here only because of a reference to an examination of the material made 
by a Mr Prideaux. We are told that: “ He devoted ten days to the careful 
examination of these remains, during which he submitted to accurate measurement 
some 600 skulls. He told the Vicar at the time that he was of opinion that a large 
proportion of them were of the Celtic type, the greater part of the remainder being 
of the Anglo-Saxon type. Two skulls he believed to be Roman in form, and two 
Laps or Danes.” No other account of tfiis investigation appears to have been 
published. 

In the same year Canon Scott Robertson contributed an article on “St Leonard’s 
Church, Hythe” to the Archaeologia Cantiana%. In a footnote (pp. 407—408) some 
particulars are given relating to the skulls. At the Canon’s request they had been 
examined by Dr Randall Davis, surgeon of Hythe. Among 723 crania, only 36 
had been found with a persistent frontal suture and three of these were juvenile. 
Only 10 skulls were found with injuries inflicted before death and all of these were 
on the anterior half of the cranium. An immense diversity of size and shape was 
found in the material. Although not able to distinguish with certainty between male 
and female specimens himself, Dr Davis was told that Professor Owen had picked 
out many females. 

* Cf. values given in Table 1 below. Barnard Davis says that his Hythe skulls are remarkable for 
their magnitude, but they may have been selected on this aeoount. The type iB aotuaUy a small one. 

t “On Human Remains in the Orypt of St Leonard’s Church, Hythe,” Archaeologia Cantiana, Vol. 
xvm. 1889, pp. 888—886. 

I Ibid, i pp. 408—420. 



B. N. StOKSSIGKR AND G. M. Morant 


161 


It was not until nearly 30 years later that Mr (afterwards Professor) F. G. 
Parsons undertook a somewhat lengthy examination of the Hythe skeletons*. 
He says that 100 measurements on them had been taken by Dr Randall Davis in 
1899, but these do not appear to have been published. Professor Parsons deals 
first with the history of the bones, and after considering part of the historical 
evidence we have presented above he concludes that the bones had been dug up 
from this churchyard or neighbouring ones and stacked under the church “ in the 
way which was quite usual in pre-Reformation days.” Similar collections in the crypts 
of other Kentish churches at Folkestone, Dover and Upchurch are referred to, and 
a few mean measurements of short series from the last two are used for comparative 
purposes. The remains at Hythe are said to represent at least 4000 men, women 
and children, this number having been ascertained by “ counting all the heads of 
thighbones seen in * e-stacking the whole pile which the vicar has lately had done.” 
Several masses of hair, in which shades of red predominate, were found at this 
time and also some pottery characteristic of the fourteenth and fifteenth centuries, 
wooden platters and part of an old shoe or boot. Seven absolute measurements 
and two indices are given individually for 326 male, 230 female and 34 immature 
skulls and the means are compared with those of several English and a few other 
series. The form of the distribution is shown graphically in the case of three of 
the characters. It is concluded that the Hythe skulls resemble modem Bavarian 
and Wiirtemberg series more closely than they do the seventeenth century London 
ones from Whitechapel and Moorfields. The need for more data relating to the 
populations of other parts of England is emphasised. The average length (not 
otherwise defined) of 76 male femora is given as 45*1 cm. and of 79 female as 41*8 cm. 
These means are greater than the corresponding mean maximum lengths for the 
Whitechapel femora. This paper of Professor Parsons is the first rational account 
of the material, but from an anthropometric point of view it is clearly inadequatef. 
His cranial measurements and remarks on anomalies and other features provided 
are dealt with more fully in later sections of our paper. 

The same writer gave a lecture on the bones to the Kent Archaeological Society 
in 1914 J. No fresh evidence had been collected in the interim, but several new 
theories relating to the origin of the population are considered. Two kinds of 
cranium are said to be found, one being the characteristic long one met with in 
series from London plague-pits and the other being so short that nothing like it 
is found elsewhere in England. The types of Saxon and English Bronze Age 
skulls are not present. It is supposed that the short-headed element represents 
“continental people who settled here in a peaceful way with their women-folk, 
though I confess this is mere surmise....I can find no definite account of their 

* “Report on the Hythe Crania,” The Journal of the Royal Anthropological Institute, Yol. xxxvm. 
1908, pp. 419—450. 

t In his “Report on the Rothwell Crania’* ( Journal of the Royal Anthropological Institute , Yol. xl. 
1910, pp. 488—504) Professor Parsons gives a sagittal type contour for 80 male Hythe crania, but the 
measurements from which it was constructed are not provided. 

t “An Explanation of the Hythe Bones,” Archaeologia Cantiana, Vol. xxx. 1914, pp. 203—213. 

Biometrika xxiv H 



162 


The Hythe Crania 

coming: it may have been in the days of the wool staple in the reigns of Edward I 
and III, when so many foreigners were welcomed into England, or it may have 
been later...An alternative suggestion is that the bodies are those of Wendish 
or Vandal invaders who are presumed to have settled in this country in large 
numbers in Anglo-Saxon times. The remains in the ossuary might represent the 
type of these foreigners which persisted in this region until mediaeval times. Such 
an explanation would account for the large proportion of skulls with high cephalic 
indices, since the Vandals are thought to be of Slavonic origin and closely allied to 
modem Poles, but among other objections to it are the complete absence of the 
type from all Jutish cemeteries examined and also the absence of any place-names 
in the district of Wendish origin. 

The earlier anatomists and anthropologists who examined the Hythe skeletons 
did little to correct the gross errors of the historians who had speculated on their 
age and derivation, and they even confused the problem by suggesting new and 
fantastic theories. Professed Parsons corrected many of the solecisms of his 
predecessors and the measurements he took helped him to do this. These measure¬ 
ments are very inadequate, however, and no excuse need be made for re-examining 
the material from an anthropometric standpoint. 

(4) The Nature of the Hythe Series. Unless otherwise stated, the following 
remarks refer to the samples of 199 Hythe skulls examined by the present writers. 
Traces of reddish-brown hair were found by us, and considerable quantities appear 
to have been noted by earlier observers. A plait several inches in length is pre¬ 
served in the showcase in the ambulatory. The hair is similar in colour and texture 
to that found on ancient Egyptian skulls and there can be no doubt that it has 
been thoroughly bleached. There is no means of determining the integumentary 
colours prevailing in the living population. No sign of flesh was observed, but 
several of the skulls had fragments of brain in the brain-box. None of these was 
large and only a few had to be cut in order to extract them through the foramen 
magnum . Fragments of dried brain are commonly met with in ancient Egyptian 
skulls, but they are always more brittle, and more deprived of their organic 
constituents, than are the like fragments from Hythe. On being scraped with a 
knife the latter have a glossy dark-grey surface. 

Turning to the colour of the crania, those excavated from London graveyards 
or plague-pits and from the Spitalfields site have been found to be very uniform 
in so far as the appearance and condition of the bone is concerned. Only small 
quantities of hair were discovered on them; those from the Farringdon Street 
and Spitalfields sites contained no remains of the brain; several of the specimens 
(apart from those of the Spitalfields series where only one skull was affected) have 
green copper stains on them and all are extensively discoloured, the majority being 
of a dark brown shade. No trace of staining due to copper salts was found on the 
199 Hythe skulls*. As far as their condition is concerned, the whole series in the 

* It is worthy of record that fragments of half decayed wood were found adhering to the supra- 
oocipitals of two skulls (758 and No. #). Professor Parsons also found some “particles of woody fibre” 
mixed with a few hairs adhering to the oooiputB of some specimens. 



B. N. Stobssiger and G. M. Morant 


16S 


ambulatory appears to be remarkably homogeneous. Nearly all the bones are of a 
a whitish-grey colour which is very distinctly lighter than that almost invariably 
observed in the case of excavated London skeletons. One specimen (1001) is a 
mottled reddish-brown and a few others have dark-grey or light-brown stains on 
the calvaria, but the prevailing colour is little darker than that which a macerated 
bone might acquire after lying undusted for centuries. If a body be buried, with 
or without , a coffin, the colour which the skeleton adopts in the course of time 
is determined by the nature of the soil and it may be no reliable guide to the 
age of the interment. The ambulatory passage of St Leonards is entered by a 
door on the south side which is often open and the ventilation is more than 
adequate at all timrs. The sea air probably bleached the bones, as it certainly 
did the hair. The majority of the skulls now on the shelves are well preserved; 
some show signs of wear, such as partly eroded outer tables, and a few are 
reduced to a friable condition. Numbers have been painted on the frontal or 
parietal bones and the highest of these found was 1099. Professor Parsons says 
that there are enormous numbers of fragmentary skulls in the great pile which 
was re-stacked while ho was at Hythe. The femora were counted then and they 
are said to represent at least 4000 people. The remains other than the skulls on 
the shelves have been arranged in a single pile which is roughly rectangular in 
form and one side is against the west wall of the chamber. The three sides which can 
be examined are made up almost entirely of the ends of femora with a few other 
long-bones and crania intercalated between them. The dimensions of the pile are 
approximately 24 by 5J ft. at the base and the average height, excluding the 
supporting bricks, is 5 ft. It is clear that the centre, which cannot now be examined 
except on the top, cannot contain, in addition to the “ enormous numbers of frag¬ 
mentary skulls,” anything approaching the complete remains of the other parts of 
4000 skeletons * A stringent selection must have been made, either when the 
bones were first placed in the vault, or at some subsequent date. The collectors of 

* Ii appears from Professor Parson’s measurements that the average length of the Hythe femora 
without regard to sex is approximately 17 inohes. On this assumption, the femora which form three 
sides of the pile occupy almost exactly one-third of its total volume. But the remaining two-thirds is 
said to consist largely of fragmentary Bkulls and femora, so it is clear that the other bones of the skeleton 
are only present in comparatively small numbers. There are at present about 1100 skulls on the shelves 
in the north and south bays of the ambulatory. A printed notice hanging there says that 600 skulls were 
taken from the pile and arranged on the shelves in 1651, while 500 have been added since. According 
to the same notice the bones were re-stacked in 1906 to allow the passage of air underneath the pile and 
to preserve them from decay. Nearly 6000 thighbones were then counted besides fragments. Several 
writers have given the dimensions of the pile at different dates: in 1776 it was said to be 26 x 6 x8ft. 
high (Seymour;, in 1788, 80 x 8 x 8ft. (see Plate la below), in 1790, 28 x 8 x 8ft. (Hasted), and the Hythe and 
Sandgate Guide published in 1816 describes the pile as being 26 ft. in length, 8 in breadth and formerly 
8 in height, though the last measurement had by this time been reduced by about 2 ft. owing to the 
decay of the lower bones. The dimensions to-day are considerably less than these. The 1100 skulls 
have been removed since 1816 and the process of decay has doubtless oontinued. No dried bones are 
likely to remain intact long if they are near the bottom of a stack 5 ft. high. The Rev. H. D. Dale says 
that some skulls were allowed to be taken away before he was Vicar of St Leonard’s (St Leonard's 
Church , Hythe , 1981, p. 56). Six of these are now in the Museum of the Royal College of Surgeons, 
but the location of the others, if they are still preserved, is unknown. Plate I reproduces some engravings 
of the interior of the ambulatory made at different times. 


11—2 



164 


The Hythe Crania 

this material, which has been drawn from a minimum of 4000 skeletons, must have 
selected the thighbones and the skulls, while discarding in general other parts of 
the skeleton and probably the mandible* * * § . The differences between the conditions 
of individual specimens, in so far as the texture of the bone is concerned, are not great 
in general. The skulls which Barnard Davis acquired had probably been selected 
from a sample which was itself a stringent selection of the total population con¬ 
cerned. Even if this had not been so, it is not clear that he was at all justified in 
asserting that bones so well preserved could not possibly have been in the ossuary 
for more than 300 years. The Spitalfields series is almost certainly of mediaeval 
or Roman date. Most of the skulls were broken when houses had been built on the 
grave-pits, or at the time of excavation, but the bonos are as well preserved as the 
Hythe bones and they show no signs of abrasion or decay. This is remarkable when 
we consider that the burial-ground at Spitalfields was in a low-lying locality 
which was frequently water-loggedf and as a rule bones would decay more rapidly 
under these conditions than they would in most churchyards J. The condition of 
the Hythe skeletons cannot be accepted as evidence of their modern date, and 
Barnard Davis was probably judging by the colour as well as the texture of the 
bone when he denied the antiquity of the skulls in his possession§. It is not 
surprising to find that fewer than half of the Hythe crania are intact. The defects 
of many of the facial skeletons may be due to the spade of the sexton, or to the 
fact that these specimens were once lying near the bottom of a pile of bones eight 
feet high. There are numerous post-mortem fractures of the calvaria as well and 
many of these must have been mistaken in the past—by anatomists as well as 
others—for wounds inflicted during life. Among 112 male skulls we found only 
three with healed wounds; two (646 and 898) being small areas on the frontal 
bone and the other (798) having probably resulted from a sword-cut on the right 
parietal which reached the brain. Among 87 female specimens, two (655 and 758) 
have small depressions—• on the frontal and right parietal bone respectively—which 
appear to be healed injuries. The frequency of traumatic lesions is not unusually 
high in this sample, although, judging from the long-bones and skulls he examined, 

* The preference for the skull and thighbones as representative of the physical remains of the dead 
is characteristic of much mediaeval symbolism. In most ossuaries and “Gebeinhauser” these are the 
only bones preserved. 

+ Evidence that this was so at the end of the fourteenth century has been given in the Spitalfields 
paper (p. 210). The following account shows that the conditions were much the same in the middle of 
the eighteenth century: “Before 1776...every person in Spital Square in the Liberty of Norton Falgate 
{sic) was greatly inconvenienced by the springs in the liberty, insomuoh that...the water...used to be 
three or four feet deep in the cellars ; and the servants used to punt themselves along in a washing tub 
from the cellar stairs to the beer-barrels to draw daily beer....” (Steven Totten: A Humble Representa¬ 
tion... ” 1795.) 

X In the Museum of the Boyal College of Surgeons there are a number of skulls dredged from the 
Thames and some of these are probably more than 1000 years old. Nearly all are in a good state of 
preservation. It iB known, too, that objects of cloth and leather may be excellently preserved in peat¬ 
bogs for several hundred years. A district which is subject to periodic inundations would almost 
certainly encourage deoay when the articles in question had been buried close to the surface. 

§ Brome, writing at the end of the seventeenth century, says that the bones are very white and 
solid. 



B. N. Stoessioer and G. M. Morant 


165 


Professor Parsons concluded (without giving figures) that there was evidence of 
the turbulent life which the inhabitants of a Cinque Port experienced during the 
middle ages. He also observes that many of the skulls had earth in the brain- 
cavity as well as in the facial and auditory apertures, and this is again contrary to 
our experience. We found a good deal of earth in the facial and basal apertures, 
but the endocranial surfaces of the brain-boxes appeared to be almost entirely free 
from it. The skulls were examined carefully before filling with mustard-seed to 
determine the capacities and nothing was found inside them except a certain 
amount of dust which might have accumulated in the ambulatory and the fragments 
of brain already referred to. As Professor Parsons does not mention the last, it is 
possible that he did not examine the contents carefully and mistook fragments of 
brain for lumps of ea r th. 

The sample of 199 crania described in the present paper was selected from among 
some 500 specimens kept on shelves in the far (north) bay of the ambulatory. Those 
chosen were all adult and intact, or nearly intact. The numbers in Professor 
Parsons’ report range from 1 to 590 and all these skulls, with the exception of one 
(No. 132), appear to be kept at present on the shelves in the south bay near the door. 
The samples dealt with are distinct ones, except No. 132 which is included in both. 

It is most charitable to assume, perhaps, that the examinations of the Hythe 
remains made by Walker and Knox were of a more cursory nature than they would 
lead one to believe, for otherwise it is difficult to understand how they failed to find 
female specimens. The sexing of the skulls offers no peculiar difficulty and it is 
obvious that a considerable proportion are those of women. Of the 199 adult crania 
which we measured, 112 (56*3 °/ 0 ) were supposed male and 87 (43*7 °/ 0 ) female. 
Professor Parsons dealt with 556 adults and he distinguished 326 (58*6 °/ 0 ) males 
and 230 (41*4 °/ 0 ) females. The disproportion between the sexes is doubtless due 
to the stringent post-mortem selection which would have favoured the preservation 
of the stronger male specimens. As far as this evidence can show, the original 
population may well have been a graveyard one. The statistical constants for the 
series dealt with in detail in the present paper are given in Table II. A com¬ 
parison with the standard deviations and recalculated means found from the 
measurements given by Professor Parsons is made in Table I. We have not used 
his auricular height as it was taken from the centre of the meatus and it is hence 
not comparable with the biometric measurement taken from the porion. If the 
two series compared had been drawn randomly from the total collection, we 
should not expect any of the constants to show significant differences. Among 
the absolute measurements the only differences between the means which exceed 
2*5 times their probable errors are in the case of the male B (A/p.e. A ** 8*0), male 
B'( 2*5) and female B f (3*6). The marked difference between the male calvarial 
breadth evidently leads to significant differences between the indices 100 B/F, 
100 BjL and 100 B/H '. The female indices are not differentiated. The single 
significant difference in variability is found in the case of the female #"8(3*0). 
The discordance between the male mean calvarial breadths is, however, so marked 



166 


The Hythe Ormia 


TABLE I. 


Comparison of the Measurements of two Samples of the Hythe Skulls. 


Sample* 

Sex 

F 

L 

]i 

11' 

H' 

measured by 

Means 

Parsons 
Stoessiger \ 
and Morant J 

4 

177*1 ± '24 (324) 

176*3± *40 (112) 

178-6 ±*24 (319) 
177-9+-39 (112) 

143-6 + *21 (324) 
146*7 ±-34 (112) 

99*5+ *17 (318) 
98-6+-31 (109) 

133- 5 ±-21 (306) 

134- 1 ±-32 (112) ^ 

Parsons 
Stoessiger 1 
and Morant J 

9 

170*8 ± *29 (230) 
170-B±-44 (87) 

171-1 ±-28 (227) 
171-4 + *43 (87) 

139- 8 + -21 (230) 

140- 2 ±-39 (87) 

96-0+-18 (227) 
94-7 ± -31 (87) 

127-7 ±-25 (222) 
127-2+ -34 (86) 


Standard Deviations 

Parsons 
Stoessiger ) 
and Morant j 

4 

0*45 ±’17 

6*34 ±*29 

6-48+ *17 

6-16 + -28 

5*5(1 ± *15 

5 *35+ *24 

* 4*52 ± *12 

4*77+ *22 

5-48±-15 

4-99+ -22 

Parsons 
Stoessiger \ 
and Morant J 

9 

6*6<)± *21 

' (i*lf)±’31 

6-34 +-20 

5*96+-30 

4*09 ± *15 

5*44±'28 

3*98+ *13 

4*30+*22 

5-59 +-18 

4-68+-24 


Samples 
measured by 

Sex 

100 JljF 

100 BjL 

100 H'/L 

100 Bjll' 

Means 

Parsons 
Stoessiger \ 
and Morant J 

4 

81-0 ±-16(322) 
(83-2(112)} 

(80-3 (319)} 

82*6 ±-24 (112) 

(74-7 (306)} 

75-4+-22 (112) 

(107-5 (306)} 

100-5 ± -32 (112) 

Parsons 
Stoessiger \ 
and Morant J 

9 

82-0 ±-18 (230) 
(82-1 (87)} 

{81-7 (227)} 

81-9 ±-28 (87) 

174-6 (222)} 

74-3+ -25 (86) 

(109-5 (222)} 

110-3 ±-38 (86) 


Standard Deviations 

Parsons 
Stoessiger \ 
and Morant J 

4 

3*88 ±*10 

3*69 ±*17 

- 3*51 ±‘10 

4-94 ±-22 

Parsons 
Stoessiger ) 
and Morant J 

9 

3*96 ±*12 

3-83 ±*20 

3*44 ±*18 

6-18±-27 






B. N. Stoessigbr and 6. M. Morant 


167 


TABLE II. 

Constants of the Male and Female Hythe Series. 


Character 

Means 

Standard DeviationB 

| Coefficients of Variation 

Male 

Female 

Male 

Female 

Male 

Female 

4 C 

F 

F. V. L 

L 

B 

IOW 

B' 

B" 

Biasterionic B 

1T 

H 

OH 

LB 

SJ 

Sh 

Si 

s 2 

4 

s 

u 

V 

Bregmatic (/ 
Broca’s Q' 
fml 
fmb 

an 

OL 

GB 

J 

NH, R 

NH ; L 

NH' 

NB 

DS 

DC 

DA 

ss 

sc 

W 

0 2 

EH 

0 ly R 

0 U L 

o{, a 

Lacrymal 0 U R 

0 8 , R 

O^L 

1456*3 + 7-0(110) 
176*3+ *40 (112) 
170*3±*4O(112) 
177*9 ±*30 112) 
146*7 ±*34 (112) 
98*1+ *24 (110) 
98*6 ±*31 (109) 
124*8±‘36(102) 
112*7+ *27 (97) 
134*1+ *32 (112) 
134*6+*31 (112) 
115*4±*26 (112) 
100*5+ *23 (112) 
111 *4 ± *27 (112) 
108*9+ *37 (112) 
96*1 ±*32 (112) 
127*2+*37 (112) 
122*3 +-44 (112) 
116*2 + *48 (112) 
365*6 +*96 (112) 

518*4+ *97 (112) 
323*6+*68 (112) 
320*6+*67 (112) 
311*3+*68 (112) 
35*6+ *16 (111) 
30*2 ±*14 (110) 
69*9+ *27 (89) 
94*9+ *35 (89) 
94*8 ±*35 (99) 
134*3 ±*35 (96) 
51*0±*19 (111) 
61*1 ±*18 (112) 
49*2 ±*22 (99) 
24*8 +*11 (108) 
12*3±*12 (70) 
21*8±*17 (72) 
36*0+*26 (70) 
4*5+ *06 (106) 
9*6 ±*14 (109) 
60*8 +*19 (96) 
46*4±*19 (98) 
41 *7 ±*28 (51) 
12*1+ *28 (47) 
42*1 ±*10 (111) 
41*8 +*10 (109) 
40*4±*11 (82) 
39*0 ± *11 (73) 
33*0±*12 (111) 
33*0 ±*12 (110) 

1318*0 ±7 *3 (83) 
170*8 ±*44 (87) 
169*8+ *41 (87) 
171*4±*43 (87) 
140*2+ *39 (87) 
94*3±*28 (86) 
94*7 ±*31 (87) 
119*1+ -35 (84) 
108*4 ± *35 (86) 
127*2+*34 (86) 
127*9+ *34 (86) 
110*7 ± *24 (86) 
95*4 + *25 (86) 
106*6 ±*31 (87) 
105*1 + -40 (87) 
93*7 ±*37 (87) 
121*9±*44 (87) 
118*2 ±*48 (87) 
111*6±*65 (87) 
351*6+*94 (87) 
499*6+ *90 (84) 
307 *7 ±*72 (84) 
304*0 ±*71 (85) 
297*2+ *76 (84) 
35*1+ 16 (86) 
29*6+ -14 (87) 
65*1+ *27 (72) 
90*5 ± -36 (70) 
90*2 +*34 (85) 
125 3+ -39 (71) 
47*6± *16 (86) 
47*6 ± *17 (87) 
46*2 ±*19 (82) 
23*7 + *13 (85) 

11 *6 ±*17 (39) 
20*0 ±*22 (44) 
32*3±*29 (40) 
3*8+*07 (77) 
9*3±*14(79) 
47*1 ±*25 (64) 
43*3± *22 (08) 
39*4 ±*22 (44) 
10*3 ±*23 (42) 
41*6± *13 (85) 

41 *2 ± *13 (84) 
39*3 ±*18 (56)* 
38*2 ±*16 (54) 
32*7 ±*14 (84) 
32*7 ± *13 (82) 

109*7 +5*0 
6*34+*29 
6*29 ±*28 
6*16 + *28 
5*35 ±*24 
3*70+*17 
4*77 ±*22 

5 *32 ±*25 
3*88 ± *19 
4*99 ±*22 
4*86 ±*22 
4*02 ± ‘18 
3*68± *16 
4*21 ±*19 
5*77 ±*26 
5*00± *23 
5*79 ±*26 
6*87 ±*31 
7*55 ±*34 
15*03 ±*68 
15*28 + *69 
10*71 ±*48 
10*49 ±*47 
10*64 +*48 
2*44 ±*11 
2*25 ±*10 
3*84+ *19 
4*83 ±*24 
5*17 ±*25 
5*12 + *25 
2*94± *13 
2*85 ±*13 
3*24 + *16 
1*66 ±*08 

1 *51 ± *09 
2*09± *12 
3*08± *18 
0*98+ *05 
2*16± *10 
3*14 + *15 
2*72 + *13 
2*94 ±*20 
2*81 ±*20 
1*56 ±*07 
1*56 ±*07 
1*51± *08 
1*44 ±*08 
1*90 ±*09 
1*81 ±*08 

98*5 ±5*2 
6*15± *31 
5*71 ±*29 
5 *95 ±*30 
5*44 ±*28 
3*84 ± -20 
4*30 ±*22 
4*79 ±*25 
4*77 ±‘25 
4*68+ *24 
4*74+ *24 
3*68 ±*19 
3*40±*17 
4*35+ *22 
5*53+*28 
5*17 ±*26 
6*02+ *31 
6*65 ±*34 
7*58± *39 
12*99 ±*66 
12*51 ±*65 
9*83+*51 
9*72 ±*50 
10*33 ±*54 
2*19±*11 
1*88± *10 
3*42± *19 
4*50 ±*26 
4*67 ±*24 
4*85 ±*27 
2*26 ± *12 
2*34± *12 
2*51 ±*13 
1*81 ±*09 
1*61 ±*12 
2*21 ±*16 
2*70± *20 
1*02 ±*06 
1*89 ±*10 
2*91±*17 
2*74± *16 
2*20±*16 
2*24± *16 
1*76 ±*09 
1*70 ±*08 
2*01 ± *13 
1*77 ±*11 
1*86 ±*10 
1*81 ±*10 

7*53+ *34 
3*60± *16 
3*57 ± *16 
3*46 ± *16 
3*65+ *16 
3*77 ± *17 
4*84 ± *22 
4*26± *20 
3*44+ *17 
3*72± *17 
3*61+ *16 
3*48+ *16 
3*56+ *16 
3*78 ± *17 
5*30 ± *24 
5*20+ *23 
4*56+ *21 
5*62+ *25 
6*50+ *29 
4*11 ± *19 
2*95 ± *13 
3*31+ *15 
3*27± ‘IS 
3*42± *15 
6*86+ *31 
7*46+ *34 
5*49+ *28 
5*09 ± *25 
6*45± *26 
3*81 ± *19 
5*76 ± *26 
5*68+ *25 
6*59± *32 
6*71+ *31 
12*29± *71 
9*71 ± *55 
8*79 ± *50 
21 *84 ±1*06 
22*53 ±1*08 
G*18± *30 
5*86 ± *28 
7*05 ± *47 
23*25 ±1*70 
3*70 ± *17 
3*72+ *17 
3*73 ± *20 
3*70± *21 
5*77 ± *26 
6*47 ± *25 

7*47 ± *39 
3*60± *18 
3*37 ± *17 
3*47 ± *18 
3*88+ *20 
4*07+ *21 
4*54 ± *23 
4*02 ± *21 
4*40± *23 
3*68 ± *19 
3*71 ± *19 
3*33+ *17 
3*56 ± *18 
4*09 ± *21 
5*27 + *27 
5*52 ± *28 
4*94+ *25 
5*62+ *29 
6*79± *35 
3*69+ *19 
2*50± *13 
3*19+ *17 
3*20+ *17 
3*48 ± *18 
6*25+ *32 
6*36 ± *33 
6*25+ *30 
4*98 ± *28 
5*18± *27 
3*87+ *22 
4*75+ *24 
4*91+ *25 
5*43 ± *29 
7*63± *40 
13*91+1*08 
10*70 ± *78 
8*37 ± *64 
26*77 ±1*56 
20*28 ± 1*13 
6*18 ± *37 
034± *37 
5*59+ *40 
21*73± 1*67 
4*23 ± *22 
4*13 ± *22 
5*11 ± *33 
4*64 ± *30 
6*68 ± *30 
5*53 ± *29 





168 The Hythe Crania 


TABLE II—( continued ). 


Character 

Means 

Standard Deviations 


Male 

Female 

Male 

Female 

100 BjL 

82 -6 f24 (112) 

81 -9 + -28 (87) 

3 *69+*17 

3 *83+*20 

100 IPjL 

75 *4 ±-22 (112) 

74 *3+*25 (86) 

3 *51 ±*16 

3 *44 ± *18 

100 H\L 

75 -7+ '22 (112) 

74 -7 ±-27 (86) 

3 *51+ *16 

3 *71 ± *19 

100 B\W 

109 -5 ±-32 (112) 

110 -3+ -38 (80) 

4 *94+*22 

6 *18± *27 

ioo niH 

109 -2±-32 (112) 

109 -7 ±-37 (86) 

6 '00+ *23 

6 '03 ±'26 

100 (B-n')IL 

7 -l±-23 (112) 

7 -6+'26 (80) 

3 *68+-17 

3 -51 ±'18 

100 G'H/GB 

73 *5 ±*34 (80) 

72 '0± -38 (70) 

4 -57+ -24 

4 *72± *27 

100 NB/NII , It 

48 -9 ±-24 (107) 

50 '0 + '30 (85) 

3 *67+*17 

4 '09+'21 

100 NB/NHy L 

48 *8+ ‘22 (108) 

50 -0+ *29 (85) 

3 *41+ *16 

3 *94 + *20 

KM) NJijNH' 

50 *7 ± *28 (95) 

51 -6+*35 (80) 

4 *07+*20 

4 *63 ±*25 

100 Dfi/DC 

56 *8 ±'67 (70) 

57 *1 ±‘96 (39) 

8 -31 + -47 

8 *90 ±*68 

100 SSI SC 

48 •()+ -65 (100) 

41 *5 ± *77 (77) 

9 -92+'46 

10 *03 ±*55 

100 GoJG x 
hk) oyoy 

82 *9 + *51 (46) 

84 *3+*80 (39) 

6 -96+ -49 

7 *38+*56 

90-7 ±'79 (48) 

91 *7 + -75 (42) 

ft *07 +*56 

7 *18+*53 

ioo tiff/a 

28-9 ±-04 (47) 

26 *2 + *54 (42) 

6 *53+ *45 

5 *21 ±*38 

ioo , it 

78 -5 ±'32 (110) 

78 -7+ *33 (84) 

4 *98+*23 

4 *50+ *23 

100 0*!0 U L 

79 -0+ -31 (109) 

79 *5+*34 (82) 

4 *73+*22 

4 *53 ±*24 

ioo o 2 io l ', It 

100 Ojj/Lacry. O,, It 

100 fmhjfml 

81 -0+-40 (81) 

83 *2+*47 (55) 

5 *34 + *28 

5 *13+*33 

84-0+‘42 (73) 

85 *9+ *39 (53) 

5 *34 +*30 

4 *17+ *27 

84 -9 ±-30 (110) 

84 *4+*40 (86) 

| 5 *56+*25 

| 5 *51 + *28 

59 -8±-1(1 (112) 

61 *3+*22 (87) 

2 -47+ -11 

3 *04 ± *16 

Nl 

04°-f!+-22 (89) 

65*9 + *28 (70) 

3° *04+*15 

3°*44 + * J 7 

A L 

73"-7±-lH (89) 

73°*4+*28 (70) 

2 *51 + -13 

3"*49+ *18 

Bl 

41“-7 ± *17 (89) 

40°*7 ± *20 (70) 

2°*40± *12 

2°*43+ *14 

Alveolar P L 

85“*9 ± '24 (88) 

86°' 1 + *30 (72) 

3°*38+ *17 

3°*74+ *21 

Prosthion P L 

84°'0+ ’28 (88) 

84 f ’*2 + *28 (69) 

3°*50± *18 

3" *50+ *20 

e X L 

29"-4±‘19 (88) 

27°*9± *22 (70) 

2 *68±*14 

2'*76+*16 

6 2 l 

12-2+'22 (88) 

12°*8± *27 (70) 

3° *05+ * 16 

3 *32 ±*19 

.... 1 


that it needed inquiry. It may have arisen owing to the fact, that the two series 
in the south and north bays are really differentiated, having as we know been 
shelved at different periods and hence possibly corresponding to different strata in 
the original stack *. Or, personal equation may account for the difference. In 
order to decide for, or against, the later alternative, we determined independently 
the calvarial lengths, breadths and heights in the case of two different samples 
selected at random from the series measured by Professor Parsons. The resulting 
means are given below, no regard having been paid to sex: 



First sample 

Second sample 


Measured by 

i.... ....... _ . .. 


Parsons (P) 

Stoessiger (S) 

r—S 

Parsons (P) 

Moran t (M) 

(P—M) 

L 

174-6 (65) 

174*7 (65) 

-0*1 

176-4 (52) 

176-1 (52) 

+ 0*3 

B 

141*4 (64) 

142-0 (64) 

— 0*6 

143-4 (61) 

143-6 (61) 

— 0*2 

H 9 

130*3 (65) 

130-6 (65) 

-0*3 

132-2 (50) 

132-1 (50) 

+ 0*1 


# A third, but Iobs probable, hypothesis is that in selecting the more complete skulls for measure¬ 
ment we have unconsciously selected the more braohycephalic which, being rounder, might on the whole 
be less fragile. See, however, the Note inserted at the end of this memoir. 








B, N. Stoessiger and G. M. Morant 


160 


All the differences between these means are quite insignificant and the cephalic 
indices (100 B/L) they lead to are all between 81*0 and 81*5, so they accord well 
with the values for the total sample measured by Professor Parsons. We can only 
conclude that our series does in fact differ significantly from his in leading to a 
greater mean calvarial breadth and higher cephalic'index*. The variabilities of 
the two for these characters are not differentiated. 

The male and female constants for the sample which we measured in detail are 
given in Table II. There are 28 indicial and angular measurements and the male 
and female means only differ by more than three times the probable error of the 
difference in the cases of 100 H'/L (A/p.e. A - 3*3), 100 NB/NH, L( 3*3), 100 SS/SC 
(6*5), 100 EH/G 2 (3*2), Oc. I. (5*5), N Z (3*7), i?Z( 3*8) and d t (5*2). A mean 
sexual difference of the same sign as the one now observed is generally found for 
100 SS/SC, 100 EH/G 2 and Oc. 1. and none of the other differences is markedly sig¬ 
nificant. Judging from all the measurements of shape, we may conclude that the 
male and female samples represent the same racial type. Comparing coefficients of 
variation of absolute measurements and standard deviations of indices and angles, 
it is found that the male variability is the greater for 33 characters, the female the 
greater for 39 and there is equality in this respect for 5 characters. The differences 
exceed three times their probable errors in the case of Biasterionic if (3*4), 0\ , R (3*6) 
and A Z (4*5). The male and female variabilities are approximately equal as is 
generally found. It will be shown below that the series is as homogeneous as any 
other European one that has been described. The sex ratios are of the same order as 
those given for other English series. 

Professor Parsons remarks that of 517 adult skulls which he examined 242 were 
above 40 at death and 275 between 20 and 40 years. No particulars relating to 
the ages of individual specimens are given. We attempted to obtain a rough 
estimate of the age constitution at death of the population by recording the state 
of closure of the coronal, sagittal and lambdoid sutures. Remarks on these are 
made in the appended tables of individual measurements and, unless otherwise 
stated, any one of these sutures is open for its whole length. Five other series 
were dealt with in precisely the same way and the results are summarised in 
Table III. It is known that the sutures of male skulls close normally at an earlier 
age than those of female skulls, and the sexual difference is well brought out by 
these figures. It is probable that there are also racial differences in this respect, 
so for present purposes we may restrict the comparison to the Hythe and the three 
London series. The one from Farringdon St. represents a graveyard population of 
the seventeenth century which may have included a number of individuals who 

* This point is obviously an important one as it may influence any conclusions we may reach 
regarding the racial relationships of the Hythe population. It has been found in the case of several 
other series that the index derived from the maximum length and breadth of the horizontal type contour 
differs from the cephalic index found from the calliper measurements of the same skulls by a small 
amount which is generally less than 0*5. For our 112 male crania the contour index is 82*8 and the 
cephalic 82*6, and for the 87 female the first is 81*8 and the second 81*9. This confirms the fact that 
the male cephalic index is greater than the female, although for most series there is a sexual difference 
of the opposite sign. 



iro 


The Hythe Crania 
TABLE III. 


The Age Constitutions of the Hythe and other Series estimated from the 
State of Closure of the Principal Calvarial Sutures *. 


Series 

Sutures 

open 

Sutures 
beginning to 
close or 
partly closed 

All sutures 
closed 

Total 

examined 

• 


Farringdon St. Londoners 

19(12-2%) 

100(64-1 %) 

37 (23-7%) 

156 


Whitechapel Londoners 

21(15-4%) 

85(62*5%) 

30(22-1%) 

136 


Spitalfields Londoners. 

103(19*3%) 

301 (56-3%) 

131 (24-5%) 

535 

6 

Hythe . 

23 (20-9%) 

62(66-4%) 

25(22-7%) 

110 


26th— 30th Dynasty Egyptians 

74 (37-0%) 

103(51-5%) 

23(11-5%) 

200 


Negroes (Teita). 

12(24-5%) 

•21 (42-9%) 

16(32-7%) 

49 


Farringdon St. Londoners 

94(46-3%) 

80(39-4%) 

29(14-3%) 

203 


Whitechaj>el Londoners 

66(46-8%) 

49 (34-8 %) 

26(18-4%) 

141 

? 

Spitalfields Londoners... 

121 (49-4%) 

94(38-4%) 

30(12-2%) 

245 


Hythe . 

51(59-3%) 

26 (30-2 %) 

9(10-5%) 

86 


Negroes (Teita). 

38 (62-3 %) 

19(31-1%) 

4 (6-6%) 

61 


died in the 1665 plague. The Whitechapel sample also represents seventeenth 
century Londoners who were interred in what was either a clearance pit or a 
pest-field. The Spitalfields skulls are of unknown date, though certainly earlier 
than the others, and they are probably those of victims of a plague or other 
catastrophe. The Hythe series almost certainly represents a graveyard population. 
It is surprising to find that the age constitutions of all four series are closely 
similar. We should expect to find that the relative sizes of the age-groups would 
be very different between a group of people dying from natural causes and another 
which was subjected to an extraordinary mortality owing to a pestilence or 
massacre f. The fact that such differences are not found in the case of the four 
English series may mean that both natural and calamitous causes accounted for 
the deaths of certain proportions of the individuals in each of the samples. 

* References to papers describing the London series are given in the Spitalfields paper: the Egyptian 
skulls were selected at random from the long series dealt with by Professor Karl Pearson and Miss A. 
G. Davin in Vol. xvi of Biometrika , and the negro have*been described by Mies Kitson in Vol. xxm of 
the same Journal . 

t The first is known in modern times from the Registrar-General's returns, and for 1921 there were 
84*1 % of the males dying over 20 who were between 20 and 55 years of age at death and 65*9% over 66. 
The corresponding female percentages are 80 7 and 69 3. A thoroughgoing pestilence or massacre which 
did not discriminate between the ages of the victims would give age-groups similar to those of the census 
population. For 1921 there were 78*4% between ages 20 and 55 and 21*6% over 65 for the males, and 
for the femaleB the percentages are 77*6 and 22*4. There would be more than twice as many in the 
younger group for the calamitous deaths as for the normal graveyard population. The disproportion 
may not be as great as this between a pestilence and a normal sample, Binoe plagues may be expected 
to discriminate to some extent between the ages of the victims, but a distinct difference would still be 
anticipated. 








B. N, Stoessigbr and G. M. Morant 


111 


In 1912 some bones were discovered accidentally on the site of the vanished 
Church of St Nicholas at Hythe. Three of these were removed to the ambulatory of 
St Leonard’s and the Rev. H. D. Dale, then Vicar, says that they were quite brown 
when disinterred and that they have gradually become whiter since*. We had 
concluded before reading this account that the peculiar whiteness of the bones in 
the ossuary was due to the fact that they had been exposed to sea air there. The 
skulls from the ground of St Nicholas are light brown, and this appears to 
distinguish them from all in the original collection except No. 1001 which is 
similarly discoloured. Only one, which is probably a female, is complete enough to 
give the cephalic index and this has the high value of 87*9. The type appears to 
be very similar to that of the long series. Among the skulls which we examined in 
detail there are a few exceptional ones which bear a more or less close resemblance 
to the type with a peculiarly low vault and markedly retreating frontal bone which 
is found frequently among the three seventeenth century London series. Such are 
the male specimens 132, 772, 808, 823 and 825 and the female 660, 806 and 849. 
There is no sufficient justification, however, for supposing that these skulls repre¬ 
sent a different race from the others. It seems that there is not a single specimen 
in the whole series which need be segregated on that account. 

Since the Hythe is an English series, its most marked peculiarity is its high 
cephalic index. If the present-day population of the town is descended in the 
main from that earlier one then we should expect to find that its mean index is 
also peculiar. No head measurements of Kentish people appear to have been 
published. We are indebted to the Headmaster of the county school at Hythe for 
permission to measure the head length and breadth of 49 boys between the ages of 
6 and 14. These were selected as being, as far as possible, representative of the 
families resident in the town for several generations+. Their mean cephalic index 
is 79*57 ± *29 and the standard deviation is 3*03 ±21. The last is not an unusual 
value and the sample may be supposed as homogeneous as most living ones which 
have been accepted as representing a single racial type. It is known that the index 
is uncorrelated with age and it may be compared with several means which have 
been given for English series from other parts of the country!. The value found 
for 2313 boys was 78 87 ± *045 and the mean for the Hythe boys differs from this 
by less than 2*5 times the probable error of the difference. The means for the 
longest male English living series available, comprising 9 based on 100 or more 
individuals, all lie between 78 and 79. The index for the living head is known to be 
about two points greater than that for the skull, and for the seventeenth century 
London series from Whitechapel, Farringdon St. and Moorfields the male values 
found are 74*3, 75*4 and 75*5 respectively. The living and dead populations are 
thus similar to one another in this respect, although the former appears to conform 

* St LeonardChurch, Ilythe, from it* Foundation , with some Account of the Life and Customs of the 
Town of Hythe from ancient Sources , 1981, p. 62. 

t Boys from the barracks are also taught in the sohool but none of these was measured. 

t Earl Pearson and L. H. C. Tippett: “On Stability of the Cephalic Indices within the Race,” 
Biometrika , Vol. xvi. 1924, pp. 118—188. The fact that the cephalic index is uncorrelated with age 
after the fifth year, at least, is demonstrated in the above paper. 



172 


The Hythe Crania 


to a type which is slightly more brachycephalic than that of the latter. The Hythe 
series has a mean cephalic index of the order 82 which would correspond with one 
of about 84 for the living. Instoad of this we find a value of 79*6. As far as can 
be told from this evidence, only a small part of the present-day population of Hythe 
is descended from the mediaeval population of the town, and the modification 
which the type has undergone has been in the direction of that one to which almost 
all modern Englishmen conform. 

(5) Remarks on Anomalies . The Hythe skulls were examined for anomalies in 
exactly the same way as the Spitalfields have been. Our samples comprised 112 
male and 87 female specimens, all of which are nearly complete, and, unless other¬ 
wise stated, it may be assumed that these totals were available for examination in 
the case of a particular anomaly. A few anomalous skulls are shown in glass cases 
in the ambulatory of St Leonard’s Church, but, since the series examined in detail 
forms only a small part of the total there, this fact is not likely to have influenced 
our frequencies to any appreciable extent. 

(a) Sutures. Remarks on the state of closure of the coronal, sagittal and lamb- 
doid sutures are appended to the tables of individual measurements. If no remark 
is made the suture is open for its whole length. The rough appreciation of the age 
distributions for the two sexes which can bo derived from this evidence has already 
been considered. The sagittal suture was generally the first, or one of the first, to 
close. An exception to this rule was noted only in the case of one male skull 
(No. 1042), which has the coronal obliterated, the sagittal closed and the lambdoid 
closing. There are five exceptions among the females (Nos. G92, 745, 758, 832 and 
899). Only one such case, and that a male showing a definitely anomalous state of 
synostosis, was noted for the Spitalfields series. Professor Parsons states that 
6*5 7o of the Hythe skulls he examined were scaphocephalic, but he remarks that 
he was unable to associate this condition with premature closing of the sagittal 
suture, and it is hence not clear how he has defined the anomaly. No example of 
scaphocephaly being due to the sagittal suture closing well before the others occurred 
in our sample. It was usual for the coronal and lambdoid sutures to begin closing 
after the sagittal and at approximately the same time as one another. The following 
specimens have complete, or nearly complete, mctopic sutures, LF + RP or RF+LP 
denoting the extent of contact between the frontal and parietal bones from the 
sagittal to the metopic suture, and the measurement the length of the common 
suture between them: males—132 (LF + RP, 1*5), 627 (LF+RP, 9*7), 644 
(LF+RP, 7*0), 700 (LF+RP, 10 1), 726 (LF + RP, 11*6), 782 (LF + RP, 
11*3), 827 (LF+RP, 4*1), 895 (LF+RP, 5*2), 937 (LF+RP, 7*0), No. a 
(LF+RP, 1*0?), 1012 (LF+RP, 4*7), 696 (RF+LP, 3*2), 844 (RF+LP, 6*5), 
860 (RF + LP, 5*7?), 1074 (RF+LP, 6*5); females—612 (LF+RP, 5*2), 620 
(LF+RP, 2*9), 655 (LF+RP, 8*5), 744 (LF+RP, 3*7), 826 (LF+RP, 3*4), 
853 (LF + RP, 4*2), 900 (LF+RP, 3*8), 1014 (LF + RP, 2*9), 969 (RF+LP, 2*1), 
1050 (RF+ LP, 5*0). All other long series examined in this way have shown that 
contact is most frequently made between the left frontal and right parietal bones. 



B. N. Stoessiger and G. M. Morant 


173 


The percentage frequency of occurrence of the suture is 13*4 for the males and 11*5 
for the females. It has generally been found that the females are rather more 
affected than the males, and this was so for the Spitalfields series giving percentages 
of 11*3 and 8*4. Without distinguishing the sexes, Professor Parsons found 52 with 
“unclosed or partly closed metopic sutures’* among 590 Hythe skulls, and the 
resulting percentage of 8*8 is apparently lower than ours owing to the fact that we 
have included cases where the suture was completely closed, or even partly 
obliterated. Regarding the state of closure of the metopic suture, this is less 
advanced than that of the sagittal in eight cases among the males, more advanced in 
two and in the same state in five; while among the females there is one case less 
advanced, two more advanced and seven in the same state. When the metopic 
suture persists to an adult stage it appears to close at about the same age as the 
sagittal. It has been shown for several series that some breadth measurements of 
metopic skulls are appreciably larger than those of non-metopic specimens on the 
average. Comparisons are made in the following table: 


Series 

Internal 

bi-orbital 

breadth 

(WW) 

Minimum 

frontal 

breadth 

an 

Maximum 

frontal 

breadth 

(IS") 

Maximum 

parietal 

breadth 

(») 

Biasterionic 

breadth 

6 

Metopic Nkulls (A) ... 

Non-metopic skulls iT») 
A-B. ... 

1(X>*1 (15) 
97-8 (95) 

+ 2-:i 

102-1 (14) 
98-1 (95) 

+ 4-0 

129*0 (13) 
124-2 (89) 

-p 4*8 

148*0 (15) 
140-5 (07) 

+ 1*5 

113*4 (13) 
112*G (84) 
4-0-8 

? 

Metopic skulls (A) ... 

Non-metopic skulls (B) 
A-B. 

96-3 (10) 
94-0 (76) 

+ 2-3 

96-8 (10) 
94-4 (77) 

+ 2-4 

121-5 (10) 
118*8 (74) 

+ 2-7 

139-5 (10) 
140*4 (77) 
-0-9 

107*4(10) 
J08-5 (77) 
-1-1 


The amounts by which the means for metopic skulls exceed those for the non- 
metopic series arrange these breadth measurements in the same order for the two 
sexes and all the male differences are greater in a positive sense than the female. 
Precisely similar conditions were found for the Spitalfields series, though all the 
differences in this case are less in a positive sense than the corresponding ones for 
the Hythe skulls. The failure of the frontal suture to close before maturity is 
reached is apparently associated with an increased breadth of the frontal region as 
a whole, while the most marked increase is in the region of the coronal suture. 
Traces of the suture between the ex- and supra-oecipital bones were found on two 
male skulls (619 R , 644 R and L) and three female (724 R t 853 L , 870 L) and the 
complete suture on the left side could be traced on one female specimen (867, 
Plate VII c). Another female skull (745) has the coronal and lambdoid sutures 
obliterated, the temporal squamae completely fused to the parietal bones and the 
calvarial walls falling in owing to senescence, but the anterior half of the sagittal 
suture is still open. No traces of horizontal sutures across the malar bones were 
found. 



174 


The Hythe Crania 


(b) Supernumerary Bones . More than 50 °/ 0 of the Hythe skulls in the case of 
both sexes have one or more wormian bones in the lambdoid suture. As usual these 
are very variable in number and size. A male specimen (843) has an ossicle of 
bregma extending backwards from the coronal suture in the line of the sagittal 
9uture for a length of 23*5 mm. and having a maximum breadth of 13 mm. There 
are also a few small ossicles in the coronal or sagittal sutures of other male skulls. 
There is no female specimen with an ossicle of bregma, but No. 875 has a wormian 
bone (15 x 7) in the left coronal suture and No. 670 has a large wormian (29 x 28) 
in the same position and also a supernumerary bone 27 mm. long between the left 
temporal squama and parietal bone above the auricular passage. There are 110 
males on which a bregmatic ossicle might have been observed, so the percentage 
frequency is 0*9 for males, and for females it is zero. Professor Parsons found 6 cases 
in nearly 600 Hythe skulls, 3 belonging to either sex, and these frequencies are not 
unusual. Epipteric bones were counted only as such when they were found to have a 
maximum length of over 3 mm. There are 24 male specimens having at least one 
of these supernumerary bones at the pterion at the right or left, or both, sides out 
of 75 possible cases. Counting multiple cases, there are 20 on either side. For the 
females there are 17 affected out of a possible 66, there being 14 bones on the 
right and 17 on the left. Among the 75 male skulls having the sutures at the 
pterion open, there is one with fronto-temporal articulation on both sides (1070); 
among the 66 female skulls there is one with articulation on both sides (967) and 
another showing a close approach to the condition on the left (655). These fre¬ 
quencies appear to be quite normal for a European series. True inter-parietal 
bones were found in 5 out of 112 male skulls (4*5 °/ c ) al *d 3 out of 84 female 
(3*6 °/ 0 ). The male and female percentages for the Spitalfields series were respec¬ 
tively 1*6 and 0*5. Among an unspecified number, but probably about 550, of the 
Hythe skulls belonging to both sexes Professor Parsons found 13 (ca. 2*4 °/ 0 ) with 
true inter-parietal bones, so the percentages for our smaller sample may be high 
owing to a fortuitous selection. Among the male specimens affected, one (1031) 
shows the transverse occipital suture only open, though the lambdoid suture is 
partly obliterated, another (894) shows the same simple inter-parietal form, 
one (1049) has the two ossa triangularia only separate, one (903) has the complete 
tripartite form, except that there is no suture between the os pentagonale and left 
os triangulare , and the last (1071) has the left os triangulare and the os pentagonale 
only separate, but with no suture between them. One of the affected female 
specimens (744) has the right os triangulare only separate, another (818) has the 
os pentagonale only separate and the last (875) shows the complete form except 
that there is no suture between the right os triangulare and the os pentagonale . 
Traces of the transverse occipital suture near the asteria were observed on a few 
other skulls. A large symmetrically placed and undivided ossicle of lambda (os 
epactal ) was found on one male specimen (970) and two female (620 and 764). 

(c) Teeth . Remarks in the tables of individual measurements refer to the 
number of teeth lost before death and to the state of wear of those remaining in 
situ, the vast majority having fallen out after death. Many are markedly worn and 



B. N. Stoessiger and G. M. Morant 


116 

only 28 showing signs of caries were found. The absence of both third molars was 
noted in the case of 16 male skulls (630, 648, 666, 674, 710, 726, 733, 746, 792, 
827, 862, 912, 942,1031, 1037 and 1071) and 14 female (670, 758, 762, 826, 841, 
857, 867, 870, 907, 908, 911, 939, 955 and 1004). There are rlso four male skulls 
(632, 725, 732 and 1049) and four female (613, 697, 702 and 1002) with no third 
molar on the right side, and two female (727 and 1092) with no third molar on the left 
side. There seems to be no significant sexual difference between these frequencies. 
It is of interest to inquire whether the palate is of an unusual size when the third 
molars fail to develop. The means given below are respectively for the skulls with 
no third molar on either side and for all the others: 


Se »es 

Palatal length 

(<?i> 

Palatal breadth 
«?*) 

Palatal height 
(Eli) 

<J 

No third molars 

Third molars erupted 

49-7 (14) 
51-0 (82) 

41 -5 (12) 
41-8 (39) 

11- 4 (9) 

12- 2 (38) 


No third molars 

Third molars orupted 

46'8 (11) 

47-1 (53) 

39-3 (9) 
39-4 (35) 

9-5 (9) 

10-6 (33) 


These figures suggest that the palate is slightly smaller on the average for the 
skulls having no third molars, but larger numbers would be required in order to 
prove that this difference is significant. A few dental anomalies were observed. 
A male skull (646, Plate YII b) has the socket for the right canine divided and 
what appears to be a supernumerary tooth is erupting behind the sockets of the 
incisors on the same side. Two female specimens (620, Plate VII a, and 899) have 
the canine erupting behind the second incisor on the left; another (955) has 
neither canines nor third molars erupted and another female (758) has no third 
molars, but there is a socket for a supernumerary tooth (lost post-mortem) between 
the canine and first premolar on the right side. 


( d) The Base of the Skull. The relative, sizes of the jugular foramina were 
compared and the following frequencies are found, JR or JL denoting that either 
the right or left side is the greater: 



JR 

J = 

1 

61 (54T>%) 

51 (58-6%) 

15(13-4%) 

14(16-1%) 


JL 


36 (32*1%) 
22 (25-3%) 


The balance in favour of the right side has been found for every other series 
examined in this way and very similar percentages were given by the Spitalfields 
skulls. As is usually observed in the case of a long series, there are various forms 
and sizes of precondyles present and the percentage frequency of the condition is 






176 


The Hythe Crania 

of no value unless it is defined in some arbitrary way. Two distinct processes were 
found on three male (718, 808 and 827) and five female (fi70, 719, 724, 818 and 
864) specimens; five male (885, 869, 891, 970 and 997) have a single process, 
whether placed symmetrically with regard to the median sagittal plane or not, and 
another male (632) has two small precondyles connected by a bridge of bone. One 
Spitalfields skull (see Plate VII b in that paper) was found with a small spine 
projecting into the foramen posterior to the basion and there are two Hythe male 
(626 and 795) and two female (612 and 624) skulls showing the same condition. 
A female specimen (771, Plate VI d) has an articular surface at the anterior margin 
of the foramen magnum ; another (1004, Plate Via) has an irregular surface there 
and a fossa pharyngea —the only one noted—7 mm. deep, and another female (719) 
has a par-occipital process on the right side in the form of a blunt spine. An unusual 
condition is shown by a male specimen (1071, Plate Vic) having the left side of 
the basi-occipital normal, but the right considerably enlarged. A female (659, 
Plate VI b) has a clear opisthiai notch. 

(e) Other Anomalies. Cases of tympanic perforation were recorded and any 
hole in the plate through which a wire 0*28 mm. in diameter could be passed so 
that it entered the auricular passage was counted as such. As usual the holes 
were found to vary considerably in number and size. Among 112 males there are 
16 affected (616, 632, 640, 644, 646, 685, 754, 792, 815, 825, 827, 863, 895, 971, 
974 and 997), the total numbers of perforations being 17 on the right side and 16 
on the left, and among 85 females there are 30 affected (610, 620, 636, 654, 660, 
692, 702, 713, 715, 719, 724, 737, 744, 753, 758, 803, 806, 817, 834, 856, 857, 870, 
939, 957, 960, 1014, 1047, 1054, 1088 arid 1096), the total numbers of perforations 
being 36 on the right side and 45 on the left. The percentage frequency is thus 
14*3 for the males and 35*3 for the females, while a negro series from Kenya 
Colony examined in precisely the same way* gave 46*1 °/ 0 for the males and 
53*3 % * or the females. This condition appears to exhibit distinct racial and 
sexual differences. Although it is generally supposed to be influenced by the age 
of the individual, this supposition was not confirmed in the case of the negro series 
referred to. The table below gives the percentage with perforated tympanic plates 
occurring in each of the age groups distinguished from the condition of the 
principal sutures, the totals for these groups having been given in Table III 
above: 




Sutures beginning 



Sutures open 

to close or partly 
closed 

All sutures closed 

6 

13-0 % 

14-5 % 

16-0% 

9 

41-2% 

34-6 % 

o-o % 


* Elisabeth Kitson: “A Study of the Negro Skull with Special Reference to the Crania from Kenya 
Colony,’* Biometrika, Vol. xxm. 1981, pp. 271—814. 




B. N. Stoessigee and G. M. Moeant 


177 


These figures again suggest that there is little, if any, association between the 
age of the individual at death and the condition in question: it is certainly possessed 
by some young adults and some aged persons are without it. A male specimen 
(895, Plate VII d) has a large perforation, having roughly the form of a circle 
0*8 mm. in diameter, on the left side and the outer border of the element is formed 
by a thin and irregular bridge of bone. The articular surface of the glenoid fossa 
on the same side is enlarged and roughened as the result of arthritis. This siirface 
on the right is only slightly roughened and the tympanic plate is normal. A female 
specimen (1054, Plate VII e and f) has quite anomalous auricular passages and the 
forms on the two sides are very similar. Each tympanic plate has six perforations 
and thin and very irregular outer edges. The posterior walls of the passages are 
also irregular and defc ;tive in parts. Such an irregularity appears to be quite rare 
and it may be remembered that a male Spitalfields skull showed the complete 
absence of the posterior wall of the right auricular passage (see Plate VI d in that 
paper). The only exostoses having their maximum length more than 6 mm. were 
one on the left parietal bone of a male specimen (941), one on the occipital bone 
near the right mastoid process on a female (900) and one on the left parietal of a 
male (No. x). The last is the largest; it has a circular base with a diameter of 
35 mm. and a height of 11 mm. There are no marked examples of a post-coronal 
deprevssioti, but a peculiarity which may be called a post-coronal eminence was 
noted in the case of two male (663 and 929) and three female (753, 806 and 828) 
specimens. The most marked of these (929) is shown in Plate V a. There is a 
distinct eminence not immediately behind the coronal suture, but nearer the normal 
position of the vertex. There are a few examples of a low median sagittal ridge and 
these occur more frequently among the metopic than among the non-metopic skulls. 
A few specimens have unusually retreating frontal bones, but these are excep¬ 
tional cases, and again a few have an asymmetrical calvaria, though this condition 
is never exaggerated. Two female skulls (875 and 967, Plate V b) have an 
unusually globular form of calvaria, but these appear to be extremes of normal 
variation in that direction rather than examples of hydrocephaly.'iff'he only distinct 
traces of disease, apart from dental caries, are the arthritic glenoid surfaces on two 
crania, one male (895, Plate VII d) and one female (1010). Professor Parsons says: 
“Ten or twelve examples of syphilitic lesions were found in the form of ulcerations, 
necroses, gummata and periostitis. One skull with a large gummatous heaping up 
of bone has been shown to countless visitors in the past as an instance of a healing 
wound.” The evidence for the occurrence of syphilis appears to have been derived 
principally from the long-bones. 

(6) Comparison between the Hythe and Spitalfields Senes. In the paper 
dealing with the Spitalfields series it has been shown that there is a peculiarly 
close resemblance between that type and the one represented by the Hythe skulls. 
Precisely similar methods of technique have been used in describing the two 
samples and it will be profitable to make a somewhat detailed comparison between 
them. The frequencies# with which anomalous conditions occur can give no definite 

Biometrika xxiv 12 



178 


The Hythe Crania 

measure of the racial affinities of two series in the present state of our knowledge 
and large numbers would be needed in order to give reliable determinations. As 
far as can be seen, the two English series are very similar to one another in these 
respects. The frequency of occurrence of the metopic suture is unusually high for 
both and both have examples of malformed dentitions and auricular passages. 

Using coefficients of variation of the absolute measurements and standard 
deviations of the indices and angles, the variability of our Hythe sample is com¬ 
pared in the table below with that of the Spitalfields and with those of two other 
European series: 


Series 

No. of oharaoters 
with greater 
variabilities 
than Hythe 

No. ot characters 
with variabilities 
equal to 
Hythe 

No. of oharaoters 
with lesser 
variabilities 
than Hythe 

No. of 
characters 
compared 

Farringdon St. English 
(17th century) 

6 

9 

. 46 (68-7%) 

44(65-7%) 

1(1-5%) 

0(0-0%) 

20(29-9%) 

23(34-3%) 

67 

67 

Spitalfields 

6 

9 

46 (63-0%) 
18(50-0%) 

2(2-7%) 

2(5-6%) 

25 (34-2%) 
16(44-4%) 

73 • 
36 

Basque 

6 

19(57-6%) 

0(0-0%) 

14(42-4%) 

33 


Comparisons made in this way are far from exact since the number of characters 
used is not constant and different measurements of the same feature are included 
in them, such as the different nasal heights or orbital breadths, but there is a clear 
suggestion that the Hythe series is less variable than these other three. Similar 
comparisons of English series and the Basque have been made previously and 
their relative variabilities give the decreasing order: Farringdon St.—White¬ 
chapel—Spitalfields—Basque—Hythe. The differences between adjacent series 
according to this grading are certainly very small and they are, perhaps, not larger 
than those which might arise from random sampling. It may be noticed that the 
evidence for racial homogeneity furnished by a considerable number of characters, 
as in the present instance, may be appreciably different from that which a single 
measurement suggests. The standard deviations of the male cephalic indices for 
the five series compared above and in the order given there are: 8*48 ±*14, 
8*26 ± *14, 3*84 ± *11, 2‘68 ± ’15 and 3*69 ± *17. If this character had been con¬ 
sidered alone, our conclusions would hence have been substantially different and it 
is evident that the only safe method of comparing the relative variabilities of 
several series is one which takes into account the constants relating to a number 
of different measurements. The samples with which we are dealing are generally 
small ones for statistical purposes and it is not surprising to find that the majority 
of the differences observed are insignificant. Of the 73 characters for which con¬ 
stants of variability can be compared in the case of the Hythe and Spitalfields 
male series, there are only four for which the difference exceeds three times its 




B. N. Stoessiger and G. M. Morant 


179 


probable error. These are: SS(&/ p.e.A«3*l), 100 NB/NH, L (3-5), (75(4*4) and 
100 OjJGi (3*2). The Hythe constant is the less in the first two cases and the 
greater in the others. For the female series there are only 36 comparisons possible, 
since constants of variability were not calculated for fewer than 30 skulls, and only 
one difference is significant, viz. fmb (3 5), the Spitalfields constant being the 
greater. It was found that the Spitalfields male coefficient of variation for GB was 
an outstandingly low value (3*98 ± *21); the Hythe value (545 ±*26) is close to 
several others available and there are no variabilities for the latter series which are 
extreme. 

The Hythe male and female means are given in Table II. The following lists 
give the ratio of the difference from the corresponding Spitalfields mean to the 
probable error of the difference for all out of the possible 77 cases having this 
ratio greater than three: 

Male— C (3*3), F{ 41), F.V. L (5*7), L (6'1), 5(6*5), 5" (4*2), Biasterionic B 
(4*8), H (4*6), &Y (5*7), S 2 (4*2), Q' (4*7), Bregmatic O' (3*4), Broca’s Q/ (3*6), 
fml (3*1), fmb (3*6), J (5*1), NH, R (3*7), NH, L (4*3), NH' (3*1), DC (4*3), 
DA (5*9), G x (4*3), G x (51), 100 B/L (11*1), 100 H'/L (6*4), 100 H/L (7*2), 
100 B/H' (3*4), 100 ( B-H')/L (3*7), 100 NB/NH, R (3*1), 100 NB/NH, L (3 4), 
100 NB/NH' (3*1), 100 0 2 /0 x , R (3*7), 100 O a /Lacr. 0 X (3*0). 

Female— C (4*2), B (5*6), B" (3*4), Biasterionic B (40), H (4*3), U (3*6), 
Broca’s Q’ (31), fmb (5*5), DA (3*8), 88 (31), G 2 (3*1), 0 lt R (5*7), O x , L (5 5), 
0 2 , R (4-3), 0 2 , L (31), 100 B/L (5*4), 100 B/H' (3*1), 100 {B-H')/L (3*4), 
100 fmb/fml (3 2). 

There are 33 characters showing a significant difference between the male 
means and only 19 between the female. This discrepancy is merely due to the fact 
that the male means are based on larger numbers of individuals than the female, 
and it will be shown below that there is no reason to suppose that the types for 
one sex resemble one another more closely than those for the other. In comparisons 
of this kind it is not at all unusual to find that the difference between two means 
exceeds 50 or 100 times its probable error in the case of the most divergent 
characters, and the fact that the largest ratio above is as low as 11*1 indicates that 
the types are very similar. If samples of 10—20 individuals had been the only 
material available for one sex, it is likely enough that no single character could 
have been proved to indicate a significant difference. Between the male means the 
highest ratio considered is for the cephalic index (11*1), and the only other values 
greater than 6 are for the height-length indices and the calvarial maximum length 
and breadth. Measurements of the calvaria are able to differentiate the types more 
effectively than those of the face. The following conclusions may be deduced from 
a comparison of the two series of means, the more reliable male constants being 
considered more than the female. The Hythe type has the lesser length, but the 
greater breadth and height of the brain-box and its capacity and transverse arc are 
hence the greater. It is of interest to note that the maximum calvarial breadth 
shows a more significant difference than any other breadth measurement, though 

12—2 



180 


The Hythe Crania 

the bizygomatic differentiates the types nearly as well. The difference between the 
total calvarial lengths is associated with that between the median sagittal lengths 
of the parietal bones from bregma to lambda, but the lengths of the frontal and 
occipital bones in the same plane cannot be distinguished. The Hythe type has 
the greater length and breadth of the foramen magnum . Significant differences 
between the calvarial indices involving the length, breadth and height are to be 
expected from these relations and the Hythe male cephalic index (82*6 ± *24 (112)) 
is three points greater than the Spitalfields (79*4 ± *16 (274)). The only clear 
distinctions between the facial characters are those associated with the greater 
nasal height and hence lower nasal index for the Hythe series and again with its 
lesser dacryal arc and chord. There is also a suggestion that the Hythe type has 
the greater orbital height, palatal length and orbital index. It should be realised 
that the absolute differences between the Hythe and Spitalfields means are in all 
cases very small. The calvarial breadths show a more significant difference than any 
other chords, but the Hythe male mean only exceeds the other by 2*6 mm. which 
is 1*8 °/ 0 of the measurement. If two skulls were available possessing all the 
mean measurements of the two series it is unlikely that any distinction between 
their facial skeletons could be appreciated from a visual examination. The only 
clear difference between the brain-boxes would be in the case of the character 
expressed by the cephalic index, the maximum lengths and breadths on which it 
depends only differing slightly but still in opposite senses, so that there is an 
appreciable difference in the form of the outline seen in norma verticalis. 

A comparison of the Spitalfields mean measurements with those available for 
other European series showed that the London population was characterised by an 
extremely small nasal height and an extremely high nasal and low orbital index. 
For these three characters the Hythe are close to the Spitalfields values and 
nearer than they are to the inter-racial mean. The London type was also found to 
have a shallower and rounder palate (judging by the indices 100 EII/Gz and 
100 G 2 IG 1 ) and a flatter nasal bridge (judging by the index 100 DS/DC) than any 
other European race, though the comparative material for these characters is very 
inadequate. Neither in these ways, nor in any other which can be estimated from 
the measurements, is the Hythe type peculiar, and most of its characters are 
typically European. 

Comparisons by the method of the coefficient of racial likeness may now be 
considered. The Farringdon St. standard deviations were used for this purpose, 
since several hundred coefficients have previously been calculated with them, and 
slightly lower values are hence to be expected than if the constants for the more 
homogeneous Spitalfields or Hythe series were used. Between these two a crude 
male coefficient of 5*18 ± *17 is found for 31 characters. The Spitalfields n is 163*6 
and the Hythe 101*3, giving a reduced coefficient of 4*14 ± *14. The female palatal 
breadth and index were omitted in calculating the female values since the Spital¬ 
fields means are based on fewer than 10 crania. For the remaining 29 characters 
the crude coefficient is 3*08 ± *18. The Spitalfields n is 50*6 and the Hythe 81*1, 



B. N. Stoessighr and G. M, Morant 


181 


giving a reduced coefficient of 4*93 ± *28. The difference between the male and 
female crude coefficients is markedly significant, and this was to be expected since 
the numbers of crania for the two sexes differ markedly. Correction is made for 
this difference by reducing the constants, and the difference of 0*79 found after this 
has been done is only 2*5 times its probable error. In the case of each sample we 
have every reason to believe that the male and female series represent precisely 
the same population, and the absolute racial divergencies between the male means 
and between the female means should hence be of the same order. The male means 
will give the better measure of this divergence since they represent larger numbers 
than the female. The a’s found in computing the coefficients give another measure 
of the significance of a difference in the case of each character compared. The only 
values greater than 10 in the case of the males are: 100 BjL (a= 67*27), 100 H'/L 
(19*67), B (16*45), L (15*78) and J (11*09); and in the case of the females 100 B/L 
(22*13), Oi, R (16*95), B (16*11) and fmb (14*15). The outstanding difference is the 
one between the cephalic indices. 

A final comparison may be made between the type contours of the two very 
similar series. The mean measurements used in constructing the Hythe figures are 
given in Tables IV—VI and the types are Figs. 3—8. The Hythe contours for 
both sexes are based on adequate numbers of crania, but there were only 24 female 
Spitalfields skulls which could be used for this purpose. The tracings provided 
greatly facilitate the comparisons. Superposing the transverse sections, it is found 
that the Spitalfields outlines fall entirely inside the Hythe. The difference 
between the heights is very small in the case of the male figures and the maximum 
divergencies are found between the 4th and 5th parallels where both types reach 
their maximum breadths. The maximum distance between the outlines here is 
2*0 mm. on the right side and 3*2 on the left. Rather larger divergencies are found 
between the female sections. The horizontal sections are most conveniently super¬ 
posed by making the points F coincide and the FO axes covering one another. The 
Spitalfields male outline then falls inside the Hythe for three-quarters of its length, 
but the two cross between the 9th and 10th parallels and there is a difference in 
length of 2*9 mm. The maximum breadths of both types are, as usual, between the 
6th and 7th parallels, and it is in this region that the maximum divergencies are 
found if the sections are moved until the two F s and the two 0’s are 1*45 mm. 
from each other. The maximum divergence is then 2*1 mm. on the right side 
and 2*0 mm. on the left. The female Spitalfields type falls entirely within the 
Hythe, the difference between the lengths being 1*0 mm., and the maximum 
divergencies between the outlines fall between the 6th and 8th parallels. It is 
clear from the comparison of these transverse and horizontal sections that the 
greatest absolute difference between the brain-boxes of the types occurs in the 
region of their maximum parietal breadths, and this has been shown to be the 
absolute measurement which distinguishes the male series more effectively than 
any other. As may be shown from other comparisons, a difference in this region 
may be larger and more significant than those between all frontal or occipital 
breadths. 



182 


The Hythe Crania 



7 L 

8/2 

8/, 

9/2 

9L 

10/2 

10 L 

AIR 


.r 

y 

X 

y 

69-5 

(112) 

64-1 
(112) 

04*7 

(112) 

54-5 

(112) 

55*1 

(112) 

38-0 

(112) 

39-5 

(112) 

17-2 

( H 2) 

19-4 

(112) 

2-2 

(112) 

63-6 

(112) 

2-4 

(112) 

63-6 

(112) 

00*3 

(86) 

62-0 

(86) 

62*1 

(86) 

63-0 

(86) 

53-1 

L!™» 

37-4 

(81) 

37 •5 
(86) 

17-5 

(86) 

17'9 

(86) 

2-7 

(86) 

61-0 

(86) 

2-7 

(86) 

60'8 

(86) 


TABLE V. 

Hythe Horizontal Contours. Mean Values. 


FO 

Fin 

FIL 

FiR 

F\L 

2/2 

2 L 

2 J /2 

2i L 

3/2 

3 L 

177-1 

(112) 

24*9 

a is ) 

1 23-7 
(112) 

35-4 

(111) 

34*5 

(112) 

47-8 

(HI) 

47*4 

(112) 

GO-5 

(112) 

50-2 

(112) 

52-9 

(112) 

62-3 

(112) 

170-7 

(87) 

22-5 

(87) 

22-8 

(87) 

34-1 

(87) 

351 

(87) 

46-0 

(87) 

47-2 

(87) 

48-8 

(87) 

48-8 

(87) 

50-9 

(86) 

61-2 

(87) 


6 R 5 L 


7L 8fi 8£ 


69-1 

69-3 

70-0 

66-3 

66-9 

(87) 

(87) 

(87) 

(87) 

(87) 


( 112 ) ( 111 ) ( 112 ) ( 111 ) (112 




















TABLE VI. 

Hyihe Sagittal Contours . Mean Values. 



$ 63*3(112) 27*7(112) 84*3(112) 69*8(112) 98*2(111) 102*2(111)1 69*1 (112) 33*5 (112) | 12*4 (112) 0*2(112) 6*2(94) 60*1 

? 60*0 (86) 25*4 (86) 81*6 (87) 67*0 (87) 96*7 (87) 98*1 (87) 66*2 (86) 32*0 (86) 11*6 (85) 0*5 (86) 6*8(82) 47*2 


























184 


The Hythe Crania 









B. N. Stobssigkr and G. M. Morant 


185 



Fi§,EE Transverse Type Contour of 86 ? Hythe Skulls. 







186 


The Hythe Crania 
















B. N. Stobssigeb and G. M. Mobant 


187 














188 


The Hythe Cmnia 



Ftg.VU SagitW Type Contour of 112 Cf HijTKe Skulls. 



























190 


The Hythe Crania 

The sagittal sections are most conveniently superposed by making the points N 
(the nasions) coincident and the Ny axes covering one another. When this is done 
there is found to be a remarkably close similarity between the two types. The 
difference between the length is again apparent, the Spitalfields Ny line being 
2*9 mm. longer than the Hythe. The outlines of the nasal and frontal bones are 
practically coincident and the vertices and vaults also overlap. Behind the eighth 
parallel the Spitalfields outline gradually diverges beyond the other until the 
maximum difference is reached near the y, then below the base line the outlines 
converge to meet at the inion and the arcs from that point to the opisthion cover 
one another. The basi-occipital surfaces between the basions and spheno-basions 
are practically coincident and the outlines of the palate and pre-maxillae almost 
exactly so. The Spitalfields auricular point is 3*7 mm. behind the Hythe*. By 
rotating one outline until the bases of the occipital bones—the \ Op. lines— 
coincide it can be seen that the two outlines of those bones are practically identical. 
It is a most significant fact that the greatest divergence disclosed in this way is 
one between the maximum lengths of the calvariae. This is associated with a 
difference between the sagittal lengths of the parietal bones and between the 
positions of the auricular points, but at the same time the sizes and shapes of 
the sections of the facial skeleton, of the base of the skull and of the frontal and 
occipital bones are undifferentiated. On superposing the female sagittal contours 
it is found again that there is a very close correspondence between the outlines of 
the facial skeleton and palate and of the base of the skull. The calvarial sections 
do not, however, show the same relationship as before. The outlines of the frontal 
bones diverge near the second parallel and the Spitalfields section falls inside the 
Hythe until they approach again near the opisthion. The fact that the Spitalfields 
sagittal contour is larger for the one sex and smaller for the other may be supposed 
due to the inadequate size of the sample on which its female type is based. 

The foregoing comparison between the two series from London and Kent has 
shown how closely similar they are. Judging from the more reliable male samples, 
the most essential differences between the types are in the maximum calvarial 
length (the Spitalfields being the greater) and in the maximum calvarial breadth 
(the Hythe being the greater). The only other sagittal length differentiated is 
that of the parietal bones from bregma to lambda, while the bi-zygomatic diameter 
distinguishes the types more effectively than any other breadth except the bi- 
parietal. The fact that the maximum length and breadth differ in opposite senses 
leads to a very appreciable difference between the cephalic indices. Other calvarial 
indices involving the length and breadth also make a clear, though less pronounced, 
distinction between the types, but the facial skeletons of the two appear to be 
almost identical in all respects. These relationships suggest forcibly that the 
differences between the two varieties of cranium may have been occasioned by 
a single factor. An increase in the maximum length or breadth of the brain may 

* An error was made in recording the s-co-ordinates of the auricular points for the male and female 
Spitalfields sagittal contours. The measurements given (Table XIII in that paper) are from N and not, 
as is stated, from 7 . 



B. N. Stobssiger and G. M. Moraot 


191 


be supposed to be compensated for by a decrease in the other major diameter, so 
that there is little change in the volume of the organ. The skull, in this particular 
comparison, appears to have been affected principally in the parietal and temporal 
regions. The difference in length is accounted for almost entirely by the difference 
in the median sagittal lengths of the parietal bones and the difference in breadth 
is most marked near the centres of those bones (where the maximum breadth foils), 
but there is also an appreciable change in the widths of the occipital, frontal and 
temporal extremities of the parietal regions. Apart from this slight difference in 
breadth, the occipital and frontal bones, and the facial skeleton, appear not to have 
been modified in either size or shape. 

(7) Comparisons with British Cranial Series . Coefficients of racial likeness 
between the Hythe and eight other British cranial series are given in Table VII; 
female comparisons can only be made in six cases. The values based on all avail¬ 
able of the total 31 characters are supposed to give the best measure of racial 
relationship. These reduced values show clearly that the Hythe stands far closer 
to the Spitalfields than to any other English type. The coefficients between the 
Spitalfields series and the others have been given, and by comparing these two lists 
we may conclude that the two most recently described English types are closely 
allied to one another and widely removed from all the others. The English Bronze 
Age type resembles the Hythe rather more closely than it does the Spitalfields, 
but it is the latter which is nearer to the seventeenth century Londoners, Anglo- 
Saxons, Iron Age and Neolithic peoples. It is unsafe, however, to attach any 
significance to different degrees of distant relationship. When indices and angles 
are considered alone, the Hythe coefficients with the Spitalfields and English 
Bronze Age types are found to be of the same order. The shapes of these three 
are quite similar, but the Bronze Age skull is very significantly larger than the 
others. As is usually found, the facial characters are less capable of discriminating 
the types than are the calvarial ones. This is most noticeably so in the case of the 
comparisons with the Moorfields series. The reduced coefficient for facial characters 
is the second lowest male value of this kind in Table VII and the female value is 
the lowest of all and actually insignificant. For the same group of characters the 
Moorfields was found to have the lowest male reduced coefficient with the Spital¬ 
fields series and the female value was also insignificant*. This is a curious result, 

* The coefficients of racial likeness between the Spitalfields and other British series are given in 
Table IV of the Spitalfields paper. An error was made in computing the female coefficients with the 
Moorfields series. The values below are the corrected ones, the n’s and coefficients for faoial characters 
remaining unchanged. 


Crude Coefficients 

Reduced Coefficients 

Ail 

Characters 

Indices and 
Angles 

Calvarial' 

Characters 

All 

Characters 

Indices and 
Angles 

Calvarial 

Characters 

15*96* *18 (29) 

17*90* *29 (11) 

39'00* *24 (16) 


48*42**40 




192 








B. N. Stoessigee and G. M. Moeant 


193 


but we can scarcely doubt that the best measure of racial affinity is the one which 
takes into account all the more important features of the skull. 

No detailed comparison of single mean measurements need be made. The most 
significant differences between the Hythe and the British series other than the 
Spitalfields are found for the characters 100 BjL, 100 H'jL and L> while B , H\ U , 
S, Q\ J and 100 NBjNH ' also show several high values of a. The male means of 
all these measurements, except H ' and J, are given in Table VI of the Spitalfields 
paper. The Hythe series has the highest cephalic index (82*6), the English Bronze 
Age the second (80*9) and the Spitalfields the third highest (79*4); the height- 
length index is highest for the Hythe (75*4) and the English Bronze and Spital¬ 
fields have the same value (73*7), which is the next highest; the Hythe has the 
shortest calvarial lengtn (177*9) and the Spitalfields the next shortest (180*7); 
the nasal index is highest for the Spitalfields (52*0) and next highest for the 
Hythe (50*7); the horizontal circumference is smallest for the Spitalfields (517*0) 
and next smallest for the Hythe (518*4), while the sagittal circumference is 
smallest for the Hythe (365*6) and next smallest for the Spitalfields (368*5). The 
small sizes of the calvariae of these two types distinguish them clearly from all 
the other British series. 

(8) Comparisons with non-British Cranial Series. The Hythe type is seen to 
be closely connected with the Spitalfields and widely removed from all other British 
series available. It should be possible to find a number of non-British series which 
resemble the Hythe more closely than do all the British series except the Spital¬ 
fields. Comparisons were made with the male means of over 70 European samples 
and the coefficients were calculated in all cases when the connection appeared to 
be at all close. There are eight series in addition to the Spitalfields giving reduced 
coefficients for all characters less than nine, and comparisons with these are made 
in Table IX. Table VIII gives the mean measurements of four of these series, and 
all the others involved have been previously published in Biometrika. References to 
the sources from which the data relating to the Finns and Austrians (Vienna) were 
taken will be found in the Spitalfields paper, and the following papers provide 
additional series: 

(i) J, Matieka: “ Cr&nes et ossements des anciens cimeti&res de la ville de Prague. 
I. Les cranes vieux praguois du cimetiere de St. Nicholas dans la vieille ville 
(Prague—I).” Anthropologie {Prague), Vol. II, 1924, pp. 183—210 (in Czech with 
French r6sum6). The cranial series described came from a cemetery which was 
used from the thirteenth century until 1635. There are 115 male specimens, but 
facial measurements can only be given for small numbers. Means quoted in our 
Table VIII are for the “male” and "male?” groups combined. 

(ii) J. Matieka: "On the Craniology of the Jews. I. The Skulls from the 
Old Cemetery (Prague—V)/’ Ibid., VoL IV, 1926, pp. 163—219 (in Czech with 
English r6sum6). The majority of the 53 male skulls from a single Jewish cemetery 
belong to the seventeenth century. The means are quoted in our Table VIII. The 

Biometrika xxiv 18 



194 The Hythe Crania 

type is very similar to that of the contemporary population buried in the Christian 
cemetery of St Nicholas. 

(iii) F. Schiff: “Beitr&ge zur Kraniologie der Czechen.” Archiv fii/r Anthro - 
pologie , Bd. xxxix (N.F. Bd. xi), 1912, S. 253—292. There are 108 male Bein- 
haussch&del measured in a number of different localities in Bohemia. Means are 
given in Biometrika , Vol. xx B , 1928, pp. 366—367, and the means of the following 
series are in the same place. 

(iv) A. Weisbach: ‘‘Die Schadelform der Rumanen.” Denkschri/ten der kaiser - 
lichen Akademie der Wissenschaften. Mathematisch-naturwissenschaftliche Classe , 
Bd. xxx, 1870, S. 107—136. Measurements are given of 40 modern skulls of soldiers. 

(v) C. Luigi Calori: “Del Tipo Brachicefalo negli Italiani Odierni.” Memorie 
delV Accademia delle Scienze delV Istituto di Bologna , Serie II, Tomo vm, 1868, 
pp. 205—234. Measurements are given of 100 male skulls representing the modem 
population of Bologna and of known sex and age, and the means calculated for these 
are in our Table VIII. 

(vi) F. Ferraz de Macedo: Grime et Criminel. Essai synthitique d* observations 
anatomiques , physiologiqueSy pathologiques et psychiques sur les delinquants vivants 
et morts selon la methode et les procidAs anthropologiques les plus rigoureux . Lisbon, 
1892. Mean measurements determined by the author are given of a collection of 
male skulls of 13 assassins, 25 thieves and 9 swindlers {escrocs) made by Lombroso 
at Turin and these are quoted in our Table VIII. It is probable that the majority, 
if not all, of these criminals were of Italian origin. 

The Hythe series is seen from Table IX to have its closest connection with the 
Spitalfields although there are several other reduced coefficients of the same order 
with series from south and east-central Europe and with a series of Finns. All these 
relationships are much more intimate than any which can be found between the 
Hythe and any British series other than the Spitalfields. A similar comparison 
between the last and all the available European types was made, and the reduced 
coefficients with the Pompeians (3 54 ± 17) and Etruscans (4*04 ± T7) are lower 
than any now found. The lowest value which it is generally possible to find for these 
European series is of the order 3—5, so there is no reason to believe that the two 
English ones are peculiarly specialised. It may be noted that in the comparisons 
of the Spitalfields type with those most closely allied to it (see the Spitalfields 
paper, Table VIII) the reduced coefficients for facial characters alone were in every 
case greater, and in most decidedly greater, than the reduced coefficients for calvarial 
characters alone. Such a relation is the reverse of that generally found. From 
Table IX it may be seen that some of the facial coefficients with the Hythe series 
are less than the corresponding calvarial ones while others are greater, and the re¬ 
duction is quite marked in the comparison with the Spitalfields series. The last, 
then, is mainly distinguished from all the continental types to which it is most 
closely allied by possessing some peculiar facial characters—in particular a low 
orbital and high nasal index—but it is mainly distinguished from the Hythe by 



B. N. Stobssigee and G. M. Mobant 


106 


TABLE YIII. 

Mean Male Measurements of the Hythe and some closely allied Series*. 


Character 

Hythe 

Italian: 

Bologna 

Czech: 

Prague 

(Matieka) 

Jewish: 
Prague 

Italian 

(Criminals) 

C 

1450-3 (110) 

1666-2 (100)+ 

1498-8 (46) 



L 

177-9 (112) 

174-2 (100) 

178-7 (115) 

180*5 (53) 

177*8 (47) 

B 

146-7 (112) 

142-9 (100) 

149*0 (115) 

147*8 (52) 

146-7 (47) 

1T 

134-1 (112) 

133-6 (100) 

132*5 (80) 

131-2 (49) 

131-8 (47) 

B' 

9* 6 (109) 

99-9 (100) 

98-6 (111) 

97-8 (33) 

97-8 (47) 

B" 

124-8 (102) 

— 

125-3 (113) 

121-4 (33) 

— 

Biasterionic B 

112-7 (97) 

— 

113-2 (112) 

113-3 (30) 

111-5 (47) 

LB 

100-6 (112) 

99-0 (100) 

99*6 (80) 

100-4 (28) 

96-6 (47) 

Si 

111-4 (112) 

113-3 (100) 

109-7 (109) 

109-1 (33) 

— 

Si 

108-9 (112) 

111 -0(100) 

110-6(114) 

111-9 (33) 

— 

Si 

96*1 (112) 

93-8 (100) 

91-4 (110) 

94-7 (33) 

— 

S t 

127-2 (112) 

128-5 (100) 

126-4 (110) 

123-9 (33) 

129-1 (47) 

s 2 

122-3 (112) 

126-7 (100) 

122-9 (113) 

125-4 (33) 

124-1 (47) 

£3 

116-2 (112) 

114-1 (100) 

113-7 (110) 

116-6 (33) 

112*3 (47) 

8 

365-6 (112) 

368-4 (100) 

367-6 (107) 

365-0 (32) 

366-6 (47) 

u 

518-4(112) 

513-8 (100) 

— 

— 

513-3 (47) 

Glabella U 

— 

— 

618-9 (115) 

520-3 (33) 

— 

Broca’s Q’ 

311-3 (112) 

— 

320-9 (107) 

308-1 (31) 

316-4 (47) 

fml 

35-6 (111) 

34-4 (100) 

35-6 (83) 

34-8 (24) 

35-7 (47) 

fmb 

30-2 (110) 

30-1 (100) 

29-3 (88) 

29-9 (24) 

30-4 (47) 

J 

134-3 (96) 

132*1 (100) 

130-5 (32) 

134*5 (23) 

132-3 (47) 

OH 

69-9 (89) 

09-4 (100) 

67-9 (33) 

67-7 (24) 

— 

OL 

94’9 (89) 

— 

93-9 (30) 

96-2 (24) 

— 

OB 

94-8 (99) 

— 

.93-4 (28) 

95-4 (23) 

— 

NH' 

49-2 (99) 

_ 

49-3 (34) 

62-2 (24) 

61-8 (47) 

NB 

24-8 (108) 

— 

25-0 (33) 

25-3 (24) 

23-6 (47) 

Lacrymal O x 

39-0 (73) 

38*6 (100) 

37-2 (34) 

40-4 (24) 

38-6 (47) 

0 2 

33-0 (111) 

32*8 (100) 

32-3 (35) 

33-25 (24) 

33-2 (47) 

SC 

9-6 (109) 

— 

— 

9-7 (23) 

— 

100 B/L 

82-6 (112) 

82-3 (100) 

83-6 (115) 

82-0 (63) 

82-4 (47) 

100 B'/L 

75-4 (112) 

{76-7 (100)}t 

74-4 (85) 

72-6 (49) 

74-0 (47) 

100 BjH 

109-5 (112) 

{107-0 (100)} 

{112-5 (86)'- 

{112-7 (49)} 

{111-3 (47)} 

100 fmb/fml 

84-9 (110) 

{87*0 (100)} 

83-7 (80)' 

86-2 (29) 

85-3 (47) 

Oc. I. 

69-8 (112) 

59*9 (100) 

{57-3 (110)} 

69-0 (33) 

— 

100 O'HI OB 

73-5 (80) 

— 

71-7 (26) 

71-1 (23) 

— 

100 NBjNH' 

50-7 (95) 

— 

50-9 (33) 

48-6 (24) 

45-6 (47) 

100 Oj/Lacrymal 0, 

84-6 (73) 

{84*9 (100)} 

85-4 (33) 

82-35 (24) 

86-2 (47) 

Nl 

64° *6 (89) 

— 

{64°-9 (30)} 

{66°-3 (24)} 

— 

A L 

73°-7 (89) 

— 

{74°-3 (30)} 

{73°-6 (24)} 

— 

Bl 

41°-7 (89) 

— 

{40°-8 (30)} 

{40°-2 (24)} 



* The mean measurements of all the other series used in this paper will be found in the paperB in 
earlier volumes of Biometrika cited in the Spitalftalds paper. 

t This capacity was not used in calculating the coefficients of racial likeness as it is almost certainly 
too high. 

£ Indices and angles in ourled brackets were found from the means of the component lengths 
instead of from individual values. The occipital index found in this indirect way is generally about one 
unit smaller than the true mean, but there is a closer approximation for all other indices and for the 
angles. 


18—2 



TABLE IX. 

Lowest Male Coefficients of Racial Likeness between the Hythe and Other Series*. 


196 


The Hythe Crania 



Footnote as for Table VII. 




B. N, Stoessigeb and G. M. Mobant 


197 


certain calvarial characters. The facial characters of the Hythe type are less peculiar, 
but they also tend to affect the coefficients for all characters more in proportion to 
the effect of the calvarial characters than is found for comparisons in general. 

In order to make the comparison between these two closely allied English series 
and the continental ones as complete as possible, the coefficients of racial likeness 
were calculated between all possible pairs of the series which closely resemble 
either of the English ones. The reduced values for all characters between the 
Spitalfields series and the six continental series to which it is most closely allied 
are given in Table X of the earlier paper, and the highest value found between 
any pair of these seven series was 8*57. Those between the Hythe and the eight 
continental series to wHch it is most closely allied are in Table X of the present 
paper, and the highest value found between any pair of these nine series is 20*32, 
while 15 of the possible 36 comparisons show coefficients greater than 9. The 
Spitalfields series is fully entitled to be considered a member of a group of closely 
allied racial types, all the others being continental ones. The Hythe cannot be 
considered a member of the same group and its nearest relationships are with a 
number of types which are not all intimately allied to one another. The foreign series 
which are closely linked to both English series are those of Finns, Austrians (Vienna) 
and Etruscans, and in each one of these cases it is the Spitalfields which has the lower 
coefficient than the Hythe. A diagram illustrating the fact that the two types with 
which we are primarily concerned have distinctly different racial affinities, in spite 
of their close resemblance to one another, has been given in the Spitalfields paper 
(Fig. XIII). There are twelve continental series which we have selected from a much 
larger number on account of their affinities to the two English ones (see Table X): 
four of these are of Italian origin, one came from Vienna and three (including the 
Jewish one) from Bohemia, while another came from Rumania. These nine series 
represent a comparatively restricted area of Southern Europe. Two of the remaining 
three came from Paris and the other from Finland. The only surprising feature of 
this grouping is the association of the Finns with the other racial types. 

We have to attempt to reconcile these results, which have been reached by 
using purely quantitative methods, with other evidence relating to the ethnic 
history of England. The Spitalfields and Hythe types are closely similar to one 
another and both may be considered alien in the sense that they cannot be supposed 
to have represented the bulk of the population of the country at any one time. It 
is extremely unlikely that they were intrusive in England during any prehistoric 
period. The fact that the Spitalfields type bears its closest resemblances to those 
of Pompeians and Etruscans—the connections with these being as intimate as those 
between two seventeenth century London series—suggests forcibly that we are 
dealing with the population of London in Roman times. The sample known to us 
was probably of pure, or almost pure, Italian origin. The Hythe crania are certainly 
of post-Roman date, but we suggest that they represent a population which was 
directly descended from the marines and auxiliaries who are known to have been 
stationed in the neighbourhood during the occupation. There were probably 



TABLE X. Reduced Coefficients of Racial Likeness between Male Series closely related to the Hythe and Spitaljields Series. 


108 


The Hythe Crania 


Italian 

(Criminals) 

04 

n 

gS- 

00 

00 

04 

+I»h 

04 

>0 

ss 

+lP 

8- 

i* 

8 

i'i 

9 

P 

(81) 

8*93+*59 
(14) 

7*32 ±*56 
(20) 

1 


11 *72 ±*31 
(22) 

2*60±*52 

(14) 


6 

11*55 ±*36 
(20) 

14*15 ±-38 
(24) 

N 

n 

jt 

P 

Jewish: 
Prague 

7-13 ±*39 

(25) 

9 

+1* 

rH 

19*81 + *44 

(18) 

10*14+*42 
(26) 

? 

+|p 

8* 

p 

16-94 + *58 
(20) 

1 

CD 

10 

SI 

9 

P 


? 

+ie 

rH « 

9 

05 

4 *79 ±*50 
(20) 


? 

+'£ 

? a 

rH 

eo 

w 

it 

« 

PH 

8*49 ±*50 
(21) 

7 

+106 

CD 

1 

a 

& 

3 

+IN 

05« 

ip 

13*87 ±*31 

(22) 

04 

+ 

+icd 

id w 

6 

rH 

15-40+ -40 
(21) 

i 

+| s 

o ®*, 

>D 

1 

00 

9 

ii 

9 s “' 
do 

05 

9 

+lp 

9^ 

do 


$ 

+loT 

04^, 

9 

6 

rH 

04 

7 

+IP 

04^ 

8 


(6t) 

o^+ss-ii 

rH 

+la' 
t- r\ 
9 

2 

rH 

ip 

+)o 

l> ^ 

9^ 

rH 

<N 

8 

+leo 

?- 

8 

Czech 

(Schiff) 

i> 

rH 

+\S 

_t 04 

do 

05 

04 

04 

+IP 
co r, 

<p — 
00 

+ 

04 

+!« 

8" 

+ 

1 

5*10+ *36 
(22) 

rH 

■hP 

8" 

p 

1 

+i* 

S" 

04 


00 

+lP 

8 

+I<N 

PH 


04 

it's 

9 

05 

04 

04 

+l« 

04 

9 w 

rf 

pH 

11 *89 ±*29 
(19) 

8 

+iP 

8- 

rH 

Czech 

(Matieka) 

©4 

+I5T 

04« 

‘r* 

13*01 +*19 

(25) 

CD 

04 

+lP 

j 

1 

+ 

04 

+l« 

* 9, 

05 

+ 

S 

+i? 

h 

rH 

? 

■Hcd 

6 

pH 

$ 

SI 

9 

P 


04 

04 

+|P 

s- 

PH 

to 

« 

+IP 

CD 


r- 

04 

K5« 

9 

+ 

rH 

1> 

04 

4*1 + 

04^ 

9 

P 

a 

to w 
8 

+1® 

do 

fH 

Italian: 
Bologna 

04 

04 

+l« 

J> 

T»< 

CD 

+loT 
i- c 

rH 

a 

1 

CD 

04 

+|P 

+ 

fH 

04 

04 

+|P 

0D 

04 

+ 

+ICD 
id rt 

6 

rH 

19*81 + *44 
(18) 

? 

+ip 

8- 

P 


04 

+1© 

04 

CD 

9 

+ICD 
i- rt 

7 

<X) 

rH 

1 

1 

1 

9 

■Hoo 

9 C 

o 

rH 

s 

? C 

rH 

04 

9 

+1* 

? c 

rH 

N 

9 

+1^ 
9 ^ 

05 

PH 

Spitalfields 

i'i 

7" 

i 

CD 

+1® 

rH 

os 

05 

+l« 

rH 04 

« 

+lo? 

CD W 

05 

rH 

« 

+l« 

(<• 04 

00 s -' 

2 

$ 

+i+ 

S 

00 

04 

+|P 

9 

w 


y* 

pp 

+lp 

tO 

CD 

04 

+10? 

5 a 

P 


I- 

rH 

■Hro 

rf 04 

9 

w 

r- 

o v "" 

+ 

04 

+lp 

+ 

06 

jt 

9 

do 

Hythe 

1 

+ 

pH ^ 

04 

04 

+ldo 

+ 

04 

+1^ 

01 04. 
1- 

*5 

1- 

i's 

© 

*0 

9 

+lP 

05 

Jh> 

05 

05 

9 

+l£ 

w 

rH 

P 

04 

9 

jfs 

do 


00 

+l« 

- »» 

9 

P 

rH 

9 

+|P 

CO 04 
rH 

do 


12*34± *21 
(23) 

8 

si 

7 

do 

8 

+i« 

04 « 

9 

w 

F"f 

04 

SI 

rH 

Series closely related to 
the Hythe series 

0) 

I 

•8 

CO 

Italian: Bologna 

Czech (Matieka) 

Czech (Schiff) ... 

l 

[8 

*? 

Italian (Criminals) 

Series closely related to the 
Hythe and Spitalfields series 

fl 

S 

Austrian: Vienna 

Series elosely related to 
the Spitalfields series 

1 

A 

W 

1 

s 

1 

Parisian : L’Ouest 





B. N. Stoessigbr and G. M. Morant 


109 


merchants settled there at this time as well. Judging from the affinities of their 
presumed descendants, these people were partly of Italian and partlyof East European 
origin, the closest affinities of the later population being found with modem types 
from Bologna, Bohemia and Rumania, The inhabitants of Hythe, and of the castle 
and port at Lympne, may have formed a racially heterogeneous populace in Roman 
times, but the variability of later generations would have been reduced by inter¬ 
marriage, It is probable that the Italian population of London which remained was 
exterminated, or absorbed by larger numbers of new-comers, in the Saxon period, 
but it appears to us not improbable that the Roman population of Hythe persisted, 
with no appreciable modification from outside, until late mediaeval times at least. 
This peculiar racial type was less likely to remain pure as the town declined in 
size, but it may only have been completely absorbed in recent times by that one to 
which the vast majority of the present-day inhabitants of England conform. These 
appear to us to be the most reasonable hypotheses which will reconcile all the 
evidence at present available, but a more extended anthropometric survey of the 
past and present populations of England would be needed in order to make theories 
of this kind at all definite. 

(9) Conclusions . The age of the human skeletons preserved in the ambulatory 
passage of St Leonards Church, Hythe, is not known exactly. The earliest direct 
reference to them is found in a book written in 1678, and they appear to have been 
shown regularly to visitors from some years before that until the present day. The 
church is an early Norman one and it was enlarged considerably in Late Norman 
times and again at the beginning of the thirteenth century. The passage was built 
under the chancel at this last date with the object, it is thought, of preserving a 
processional way on consecrated ground all round the outside of the building. It is 
well lighted and above ground, and these circumstances account for the fact that it 
has remained open to inspection. Various theories associating the skeletons with 
battles or massacres have been broached at different times, but all these are quite 
untenable, and two of the battles referred to were probably not fought anywhere 
near Hythe. Men and women are represented approximately in the proportion 1*3 
to 1 and there are also a few juvenile individuals present. There are said to be more 
than 8000 femora in the collection, and there are probably more than 2000 skulls, 
of which half are nearly complete. Other bones of the skeleton, however, can only 
be found in small numbers, and it is clear that there was a stringent selection in 
favour of the crania and femora when the collection was made. Everything points 
to the fact that these are graveyard bones and no other plausible explanation of 
their origin can be offered. The town was at one time considerably larger than it 
is to-day. There were at least three churches within the present-day limits of 
Hythe and another about a mile away. All these, except St Leonard's, had been 
abandoned by the end of the fourteenth century. The disused graveyards do not 
appear to have been built on since, but bones may have been disinterred from them 
at various times. It is known that three skulls were taken from one of the sites 
and placed in the ambulatory as recently as 1912. The majority of the remains were 



200 


The Hythe Crania 

probably dug up from the ground of St Leonard’s and it is quite likely that the 
nucleus of the collection was placed in the ambulatory shortly after it had been 
built, since this extension of the church almost certainly covered the original 
graveyard. When the practice had once been started it may well have continued 
for some centuries, but the vast majority, if not all, of the bones must have been 
collected before 1650. It is most probable that the chamber was used both as a 
charnel-house and as an ambulatory at the same time. The suggestion that it 
would not have been used for the former purpose until the processions were dis¬ 
continued at the Reformation has been made. This would leave only the hundred 
years from 1550 to 1650 for the bulk of the material to have been collected. The 
yearly burials in this period are known to have been between 30 and 40 and the 
normal disinterments must have been supplemented by the addition of at least 
1000 skeletons dug up at one particular time in order to give the total now present. 
It appears to us more probable that the collection was made between 1250 and 
1650 and that the people represented died between 1100 and 1600. These dates 
are, of course, only approximate and it is just possible that an examination of the 
not yet edited records of Hythe might render them more precise. 

The Portus Lemanis , one of the principal ports of the Saxon Shore, was at 
West Hythe, two miles from St Leonards Church, and it was protected by the 
large castrum which became known later as Stutfall Castle. Roman remains have 
also been found in Hythe itself, and there must have been a considerable population 
in the district during the occupation. Little is known about the town in the Saxon 
period, but it became a Cinque Port about the time of the Norman Conquest. The 
primary cause of its decline was the silting up of the harbour which once extended 
from West Hythe to Sandgate. This gradual process had made the coast entirely 
unsuited for a large port before 1600. Fire, plague and the ravages of French 
seamen hastened the decline of the population. 

The first anthropological description of the Hythe crania which is of any value 
is that of Professor F. G.'Parsons published in 1908. He gives a few measurements 
of 590 specimens. We have dealt with a separate sample of 199 crania, and all the 
usual measurements, type contours and remarks are provided for these. The two 
independent series might be supposed to have been drawn at random from the 
original material, except that preference was given to the more complete specimens 
in our case. The mean measurements of the female scries show no differences of 
any consequence, but Professor Parsons found a male calvarial breadth which is 
significantly less than ours, and the male cephalic indices are as a result also 
differentiated. It is shown that these differences are not due to instrumental errors, 
or to differences of technique. Hypotheses are suggested to account for this devia¬ 
tion in a single character between the crania shelved in the south bay (Parsons) 
and in the north bay (Stoessiger and Morant) of the ambulatory *. 

Our series shows as small a variability as any other European series. Com¬ 
parisons between it and other European series were made by Professor Karl Pearson’s 

* See Note, p. 202. 



B. N. Stoessiger and G. M. Morant 


201 


method of the coefficient of racial likeness. The Hythe type was found to be widely 
removed from all other British types, except that of the crania recently excavated 
at Spitalfields Market, London, and these two resemble one another closely. Their 
cephalic indices (82 # 6 and 79*4 respectively for male skulls) are both distinctly 
higher than all other English mean values except those of the Bronze Age (80*9), 
but the last skull is of a larger type than the other two and cannot be supposed 
to be closely allied to them. The Hythe is more nearly related to the Spitalfields 
than to any other series with which it can be compared, and the connection is as 
close as the most intimate which are normally met with among European series. 
Relationships of the same order are found with modern types representing the 
populations of Bologna, Bohemia and Rumania. The Spitalfields series has rather 
closer relationships with both Pompeians and Etruscans than with the mediaeval 
inhabitants of Hythe. There appears to be only one plausible explanation of these 
facts. Both the types must be supposed alien to England, in so far as they do 
not represent the bulk of the population of this country at any one period. The 
Spitalfields interments are very possibly of Roman date and they are probably 
those of people of pure, or nearly pure Italian origin. The Hythe skeletons are 
certainly of post-Roman date, but they are probably those of people who were 
directly descended from the foreigners who lived in the district during the occu¬ 
pation. These foreigners may be supposed to have come partly from Italy and 
partly from Central Europe or the Balkan area, but their descendants would show 
less variability owing to inter-marriage. There is good reason to believe that the 
Roman element in the population of London had been eliminated before mediaeval 
times, but this was not so at Hythe. Measurements of the present-day inhabitants 
suggest, however, that the type which lingered on for several centuries has at last 
become transformed into one which appears to be spread uniformly over the 
country at present. More extended anthropometric investigations of the existing 
and past populations of England will be needed in order to test the validity of 
these hypotheses. 


DESCRIPTION OF PLATES. 

I. Engravings of the Interior of the Ambulatory Passage of St Leonard’s Church, Hythe. 

(a) By Thomas Russell, 1783. The legend associates the bones with the battle between the Britons 
and Saxons in 456 a.d. The stack of bones is Baid to be thirty feet long, eight feet high and eight 
feet “over.” {!>) By W. Deeble. From: Delineation #, Historical and Topographical of the Isle of 
Thanet and the Cinque Ports , Vol. n. 1818. By E. W. Brayley. (c) By G. Rowe, N.D. The 
drawing was probably made shortly after the skulls were first arranged in 1851 on the shelves. 

II. Typical Hythe Skulls. Norma lateralis. 

(«) No. 1043. Male, (h) No. 758. Female. 

III. Typical Hythe Skulls. Norma facuilis. 

(<i) No. 1043. Male. ( b ) No. 758. Female. 

IV. Typical Hythe Skulls. Norma verticalis. 

(a) No. 1043. Male, (b) No. 758. Female. 



202 


The Hythe Crania 


V. Abnormal Hythe Skulls. 

(a) No. 929. Male. Eminence at vertex, (b) No. 967. Female. Unusual globular form, possibly 
hydrooephalous. 

VI. Basal.Anomalies of the Hythe Skulls. 

(a) No. 1004. Female. Fossa pharyngea and irregular anterior border of foramen, ( 6 ) No. 069. 
Female. Opisthial notch, (c) No. 1071. Male. Thickened right side of basi-oooipital. (d) No. 771. 
Female. Artioular surface at b&sion. 

VII. Anomalous Hythe Skulls. 

(a) No. 620. Female. Canine erupting behind incisors. ( b) No. 646. Male. Anomalous dentition 
on right side, (c) No. 867. Female. Ex-occipital separate on left side. ( d ) No. 895. Male. 
Arthritic glenoid surface and large tympanic perforation. (<*) and (/) No. 1064. Female. 
Anomalous auricular passages with irregular posterior walls and multiple perforations of the 
tympanic plates. 

Note (May 20 th, 1932). The fact that significant differences are found between the samples 
of the Hythe crania measured by Professor Parsons and by the present writers was of sufficient 
importance to make further inquiry desirable. He measured the total 556 skulls on the shelves 
in the south bay of the ambulatory, and on re-measuring 117 of these we found a close agreement 
between bis and our individual readings. The 199 more complete crania on the shelves in the 
north bay were selected from the total of about 500 there, and the selected group is described in 
detail in the present paper. Table 1 above shows that there are significant differences between 
the two principal samples from the north and south bays respectively in the case of the mean 
male breadths and cephalic indices. These might have been due to the fact that our sample was 
selected while that dealt with by Professor Parsons was not. To decide this point we have 
recently measured the lengths and breadths of about three-quarters of the remaining (and hence 
less complete) crania on the shelves in the north bay for which the cephalic index can be found. 
The means for thiB sample are: 



L 

7? 

100 B/L 

6 

9 

179-1 ±-40 (110) 
171-3±-44 (76) 

146-0+ -38 (110) 
141-3±-44 (76) 

81- 6+ -27 (110) 

82- 6 ±-31 (76) 


Thus the means of the two samples from the north bay agree closely with one another, and it 
must be concluded that the crania in the south bay have a slightly lesser mean breadth and 
cephalic index than those of the crania in the north bay. We infer that the original pile contained 
differentiated strata, whotber due to a secular change in the Hythe population, or to the influx 
of another race it is not possible to determine. 




Plate I 


Biometrika, Vol. XXIV, Parts I and II 

Stocssigcr and Morant, A St inly of the Hythe Crania 



(b) 1818. 



(r) After 18. r >l. 


Engravings of the Interior of the Ambulatory Passage of 
St Leonard's Church, Hythe 




Plate II 


Biometrika, Vol. XXIV, Parts I and II 

Stoessiger and Morant, A Study of the Hythe Crania 



(/>) No. 758. Female. 

Typical Hythe Skulls. Norma lateralis (ca. 0‘6 natural size) 








Plate III 


Biometrika, Vol. XXIV, Parts I and II 

Stoessiger and Morant, A Study of the Hytke Crania 



(ft) No. 1043. Mali*. 



(6) No. 7o8. Female. 

Typical Hythe Skulls. Norma facialis (ca. 0‘6 natural size) 








Plate IV 


Biometrika, Vol. XXIV, Parts I and II 

Stoessiger and Morant, A Study of the Hythe Crania 



(tf) No. 1043. Male (ca. 0'(J natural aize). 



(b) No. 758. Female (ca. 0 05 natural size). 

Typical Hythe Skulls. Norma verticalis 








Plate V 


Biometrika, Vol# XXIV, Parts I and II 

Stoessigcr and Morant, A Study of the HytJie Crania 



r a) No. 929. Male. Eminence at vertex. 



(b) No. 967. Female. Unusual globular form, possibly hydrocephalous. 

Abnormal Hythe Skulls (ca. 0*6 natural size) 









Plate VI 


Biometrika, Vol. XXIV, Parts I and II 

Stoessigef and Morant, A Study of the Hythe Crania 



(«) No. 100-1. Pcuiale. Fossa i>liaryiigca and (6; No. 659. Female. Opisthial notch, 

irregular anterior border of loramen. 



m 


(c) No, 1071. Male. Thickened right sido of basi-oocipitul. (d) No. 771. Female. Articular surface at basiou. 

Basal Anomalies of the Hythe Skulls (ca. 1*4 natural size) 





Biometrika, Vol. XXIV, Parts I and II Plate VII 

Stoessiger and Morant, A Study of the Hytke Crania 



(„) No. 020, ca. 1\3 natural size. Female. (Canine W No. 046, ca. 1*2 natural size. Mala Anomalous 

erupting liehiiul incisors. dentition on right side. 



c) No. SOT, ea. 1*5 natural size. Female. Kx-oecipital 
separate on left side. 


(d) No. 895, ca. 1*5 natural size. Male. Arthritic 
glenoid surface and large tympanic perforation. 



(<?) and (/) No. 1054, ca. 1*9 natural size. Female. Anomalous auricular passages with irregular 
posterior walls and multiple perforations of the tympanic plates. 

Anomalous Hythe 8kull$ 






fi^JI Trwwverie Type Contour of 112 cf HyThe iliulU. 

BUmtrika, Vol. XXIV, Parts I and II 








M 


Fi£® Transverse Type Conlour of 56 ? HjjtKo Skulk. 

Biometrika, Vd. XXIV, Parts I and II 








o 

Rg.I Horizontal Type Contour 112 <f HyThe 5kulk. 


Biometrilu. Vol. XXIV. Puts I and II 











o 


‘ Horizontal Type Contour of 86 ? Hyth© Skulls. 

Biometrika, Vol. XXIV, Parte I and II 

































ON THE MEAN CHARACTER AND VARIANCE OF A 
RANKED INDIVIDUAL, AND ON THE MEAN AND 
VARIANCE OF THE INTERVALS BETWEEN 
RANKED INDIVIDUALS. 


Part <XI. Case of Certain Skew Curves. 

By KARL PEARSON with the assistance of MARGARET V. PEARSON. 


(1) In an earlier paper* I dealt at length with the case of Ranke and Rank- 
Intervals in sampling from a normal distribution. The methods were approximate, 
but sufficed to link up the accurate expressions of Professor Hojof for small 
samples with the asymptotic values for large samples given by me in 1920 

In the present paper I propose to consider the influence of skewness in the 
parent distributions on the distribution of ranks, but in doing so confine myself at 
present to certain special skew-curves, which lie, however, rather widely distributed 
over the (/Si, /8i) plane, and have accordingly a very considerable range of skewness. 
The selection of these curves for the second part of this paper has been made, in 
the first place, because the ranking problem admits in their cases of exact solution. 

N -- 

I shall start with the exponential curve y = — e a , where the mean is at a distance 

(T 

a, the standard deviation, from the origin §. I shall state in the first place certain 
preliminary propositions relating to the B-function. 


(2) On certain Functions allied to the B-function. 

The well-known relation B ( q , o') * 

a 17 r (g + q) 

leads us at once to 


d log B ( q, f) _ d log P (q ‘) _ d log r (q + q') 
dq’ d<f ~ dq' 

and general), S|H£>- > - *'°lS$+4). 


Now 




where f-Euler’s constant --5772,1566,4901,53. 


# Biometrika , Vol. xxm. pp. 864—897. 
t Ibid . VoL xst. pp. U8—82. 


t Ibid. Vol. xxiii. pp. 816—860. 
$ We have ft=4, j8g=9. 



204 


Rank- Variates and Rank-Intervals 


Let us consider the special case of 

B (n ,i-q + l)-L (£ r (»- . g ±l). 

W ’ q ' r(» + i) 


then 


r(» + l) 

d log B(q, n - q + 1) = djog r (n - g + 1) _ d log r(n+ 1 ) 
dn d(n — q) dn 


_/l 1 _1_ 1 \ 

\n n — 1 7i — 2 n — q+l) 


:- 9 S — , 

t=o n-s 


11 


d*logB (q, n-q + 1 ) _ 1 

dn® n® + (w-l)®" r (n-2)® 

?-i l 

= S 


( n-q + 1)* 


d 3 log B (q , n -q + 1 ) = _ 2 «' 1 1 

and generally 


dn® 


o (»-#)»’ 


Jjo yBfrn-i + l) _ , r 1 

dn* ,.o (n-*)* 


I propose to term the functions of B in the above series the dibeta, tribeta, 
tetrabeta, pentabeta, etc. functions, corresponding to the similar functions of the 
T-function, to which in 1919 I gave the names of digamma, trigamma, etc. functions. 
The relations between these log beta and the log gamma functions are very simple, 
but it is well to have separate names for them, and symbols likewise, as this much 
simplifies the work we have in hand. 

As the capital beta is not a convenient letter to modify I adopt the older 
Greek form of £, that is which is in itself of interest as showing the evolution of 
that letter in its capital form. 1 wish it had originally been adopted to represent 
the B-function itself. Adopting the symbols of 1919 we have 


T Gamma Function, 
p Digamma Function, 
p Trigamma Function, 
p Tetragamma Function, 
p Pentagamma Function, 

The relations between these functions 


% Beta Function, 

Dibeta Function, 

Tribeta Function, 

% Tetrabeta Function, 
Pentabeta Function. 

5 exceedingly simple. We have 


%(q, n — q + l) = r(q)r(n-q + 1)/T(n + 1), 

t(?»«“? + 1 )*p(»“? +O'-F(» + l)> 

^(g,n-^+l) = P(«-? + l)-p(re+l), 

%(q, n ~V + 1) = F( n ~2 + 1 )“P( n + !). 

•£(?, n-3 + l) = p(w-5 + 1)-P(n + 1), 


and so on. 




Kakl Pearson 


205 


Thus the various log“£ (q> n — g + 1) differential coefficients can always be 
obtained from the corresponding log T differential coefficients, and a table of the 
latter will suffice to determine the former. 

For our present purposes we need these functions for integer values only of 
n and q f and tables have been computed of inverse powers and sums of inverse 
powers for the purposes of this paper. They give twelve figure values from n = 1 

to 100. If the sum of the fth powers of the inverse numbers be denoted by 8 
from n equal 1 to n, then 

Dibeta = t(9> n-q + 1) = - (- 4* + ... H-' r 

, 1 \n n -1 n-g + 1/ 

Tribeto 

Tetrabeta ! (s• 

Pentabeta -1 (J, » - 9 + 1) - 6 (i) - 8 ^)), 

and so on. 

The following tables give the 8 values. It is scarcely necessary to remark 

that the i column gives the first differences, but they are not placed here for 
interpolation purposes as we are concerned only with integer values of n and q. 
But as they have been calculated to obtain S , it seemed worth while putting 

them on record, as the reciprocals of powers of integers are often valuable for 
other purposes to the computer. 

For values of n outside the limits of our table, we have with ample accuracy: 
s Q = -5772,1566,4901,53 + 2-3025,8509,2994,05 log w n + £»r 1 - fo~* + fon~ l 

— iT n ~*> 

8 « 1-6449,3406,6848,23 -n~ 1 + ^ - pr 3 + fo~* - fo-> + fo~ 9 , 

8 = 1-2020,5690,3159,59 - \ «-* + *«-« - J n~* + for* -fo~* + ^»r u , 

S ^ = 1-0823,2323,3711,14 - pr* + - J nr* + £n~ 7 - $«-*+ in~ u - j 

For our analysis of ranking from the exponential curve we require to express 
the differential coefficients of the "^-function in terms of the differential coefficients 
of the log ^-function. These are given on p. 215, Equations (xviii)—(xxi). 



Rank- Variates and Rank-Intervals 


TABLE I. 

Table for finding the Derived Beta and Gamma Functions, 



A 

Dibeta 

n 

l/n 

S(l/«) 

1 

1-0000 0000 0000 

1-0000 0000 0000 

2 

•5000 0000 0000 

1 '5000 0000 0000 

3 

•3333 3333 3333 

1-8333 3333 3333 

4 

*2500 0000 0000 

2*0833 3333 3333 

5 

*2000 0000 0000 

2*2833 3333 3333 

6 

•1666 6666 6667 

2*4500 0000 0000 

7 

•1428 5714 2857 

2\5928 5714 2857 

8 

•1250 0000 0000 

2-7178 5714 2857 

9 

•mi nn nil 

2*8289 6825 3968 

10 

•1000 0000 0000 

2-9289 6825 3968 

11 

•0909 0909 0909 

3-0198 7734 4877 

12 

•0833 3333 3333 

3*1032 1067 8211 

13 

•0769 2307 6923 

3-1801 3375 5134 

14 

*0714 2857 1429 

3*2515 6232 6562 

15 

*0666 6666 6667 

3-3182 2899 3229 

16 

•0625 0000 0000 

3-3807 2899 3229 

17 

*0588 2352 9412 

3*4395 5252 2641 

18 

•0555 5555 5556 

3*4951 0807 8196 

19 

•0526 3157 8947 

3-5477 3965 7144 

20 

•0500 0000 0000 

3-5977 3965 7144 

21 

*0476 1904 7619 

3*6453 5870 4763 

22 

-0454 5454 5455” 

3*6908 1325 0217 

23 

•0434 7826 0870 

3*7342 9151 1087 

24 

*0416 6666 6667 

3*7759 5817 7754 

25 

•0400 0000 0000 

3*8159 5817 7754 

26 

*0384 6153 8462 

3-8544 1971 6215+ 

27 

*0370 3703 7037 

3-8914 5675 3252 

28 

•0357 1428 5714 

3*9271 7103 8966 

29 

•0344 8275 8621 

3*9616 5379 7587 

30 

*0333 3333 3333 

3*9949 8713 0920 

31 

•0322 5806 4516 

4*0272 4519 6437 

32 

•0312 5000 0000 

4-0584 9519 5437 

33 

•0303 0303 0303 

4-0887 9822 5740 

34 

•0294 1176 4706 

4*1182 0999 0445+ 

35 

*0285 7142 8571 

4*1467 8141 9017 

36 

*0277 7777 7778 

4-1745 5919 6795- 

37 

*0270 2702 7027 

4*2015 8622 3822 

38 

•0263 1578 9474 

4-2279 0201 3295+ 

39 

•0250 4102 5641 

4*2535 4303 8936 

40 

. *0250 0000 0000 

4*2785 4303 8936 

41 

*0243 9024 3902 

4*3029 3328 2839 

42 

*0238 0952 3810 

4*3267 4280 6048 

43 

•0232 5581 3953 

4-3499 9862 0002 

44 

•0227 2727 2727 

4*3727 2589 3329 

45 

*0222 2222 2222 

4-3949 4811 5551 

46 

•0217 3913 0435- 

4-4166 8724 5986 

47 

*0212 7659 5745- 

4*4379 6384 1731 

48 

•0208 3333 3333 

4-4587 9717 5064 

49 

•0204 0816 3265+ 

4*4792 0533 8329 

50 

•0200 0000 0000 

4*4992 0533 8329 


1-0000 0000 0000 
•2500 0000 0000 
•111111111111 
•0625 0000 0000 
•0400 0000 0000 

•0277 7777 7778 
•0204 0816 3265+ 
•0156 2500 0000 
•0123 4567 9012 
•0100 0000 0000 

•0082 6446 2810~ 
•0069 4444 4444 
•0059 1715 9763 
•0051 0204 0816 
•0044 4444 4444 

•0039 0626 0000 
•0034 6020 7612 
•0030 8641 9753 
•0027 7008 3102 
•0025 0000 0000 

•0022 6767 3696 
•0020 6611 5702 
•0018 9035 9168 
•0017 3611 1111 
•0016 0000 0000 

•0014 7928 9941 
•0013 7174 2112 
•0012 7551 0204 
•0011 8906 0642 
•001111111111 

•0010 4058 2726 
•0009 7656 2500 
•0009 1827 3646 
•0008 6605 1903 
*0008 1632 6531 

•0007 7160 4938 
•0007 3046 0190 
•0006 9252 0776 
•0006 5746 2196 
•0006 2500 0000 

•0005 9488 3998 
•0005 6689 3424 
•0005 4083 2883 
•0005 1652 8926 
•0004 9382 7160+ 

•0004 7258 9792 
•0004 5269 3526 
•0004 3402 7778 
•0004 1649 3128 
•0004 0000 0000 


Tribeta 


S(lln*) 


1-0000 0000 0000 
1*2500 0000 0000 
i-36ii mi mi 

1*4236 nil nil 

1-4636 nil nil 

1*4913 8888 8889 
1*5117 9705 2154 
1-5274 2205 2154 
1-5397 6773 1167 
1-5497 6773 1167 

1*5580 3219 3976 
1*5649 7663 8421 
1*5708 9379 8184 
1*5759 9583 9001 
1*5804 4028 3445~ 


1*5843 4653 3445“ 
1*5878 0674 1057 
1*5908 9316 0811 
1-5936 6324 3913 
1-5961 6324 3913 

1*5984 308f|S^^ 
1*6004 969$lii§‘' 
1-6023 8729 life! 
i*604i 2340 
1*6067 2340 3591 

1-6072 0269 3532 
1*6085 7443 5644 
1*6098 4994 5848 
1*6110 3900 6490 
1*6121 5011 7602 

1*6131 9070 0328 
1*6141 6726 2828 
1-6150 8553 6473 
1-6159 5058 8377 
1*6167 6691 4907 

1*6175 3851 9845+ 
1-6182.6898 0035+ 
1*0189 6160 0811 
1*6196 1896 3007 
1*6202 4396 3007 

1-0208 3884 7006- 
1-0214 0574 0429 
1*6219 4657 3311 
1*6224 6310 2237 
1*6229 5692 9397 

1 *6234 2951 9189 
1-6238 8221 2716 
1*0243 1624 0494 
1*6247 3273 3622 
1*6251 3273 3622 




Karl'Pearson 
TABLE I Xcontimed). 


207 



A 

Dibeta 

A 

Tribeta 

n 

i/« 

S (1/n) 

i/»* 

S(l/n*) 

61 

■0196 0784 3137 

4*5188 1318 1467 

•0003 8446 7512 

1 6266 1720 1134 

52 

•0192 3076 9231 

4-5380 4395 0697 

•0003 6082 2485+ 

1*6258 8702 3619 

53 

•0188 6792 4528 

4*5569 1187 5226 

•0003 5599 8576 

1*6262 4302 2195+ 

54 

•0186 1851 8519 

4*5754 3039 3744 

•0003 4293 5528 

1-6266 8696 7723 

55 

•0181 8181 8182 

4-5936 1221* 1926 

•0003 3057 8512 

1*6269 1653 6236 

66 

•0178 5714 2857 

4*6114 6935 4783 

*0003 1887 7551 

1-6272 3541 3787 

57 

•0175 4385 960 

4*6290 1321 4432 

•0003 0778 7011 

1-6275 4320 0798 

58 

•0172 4137 9310 

4-6462 5459 3743 

•0002 9726 5161 

1-6278 4046 5969 

59 

*0169 4915 2542 

4-6632 0374 6285+ 

•0002 8727 3772 

1-6281 2773 9731 

60 

*0166 6666 6667 

4-6798 7041 2952 

•0002 7777 7778 

1-6284 0551 7508 

61 

•0163 9344 2623 

4*6962 6385 5575" 

•0002 6874 4961 

1-6286 7426 2469 

62 

•0161 2903 2258 

4-7123 9288 7833 

•0002 6014 6682 

1*6289 3440 8151 

mm 

•0158 7301 5873 

4-7282 6590 3706 

•0002 5195 2633 

1*6291 8636 0784 

64 

•0156 2500 0000 

4-7438 9090 3706 

•0002 4414 0625® 

1-6294 3060 1409 

65 

•0153 8461 5385" 

4-7592 7551 9090 

•0002 3668 6391 

1-6296 6718 7799 

66 

•0151 5151 5152 

4-7744 2703 4242 

•0002 2956 8411 

1-6298 9675 6211 

67 

•0149 2537 3134 

4*7893 5240 7376 

•0002 2276 6763 

1-6301 1952 2974 

68 

•0147 0588 2353 

4-8040 5828 9729 

•0002 1626 2976 

1*6303 3578 5950" 

69 

•0144 9275 3623 

4'8185 5104 3352 

•0002 1003 9908 

1-6305 4582 5857 

70 

•0142 8571 4286 

4-8328 3675 7638 

•0002 0408 1633 

1-6307 4990 7490 

71 

•0140 8450 7042 

4*8469 2126 4680 

•0001 9837 3339 

1-6309 4828 0829 

72 

•0138 8888 8889 

4-8608 1015 3569 

•0001 9290 1235" 

1-6311 4118 2063 

73 

•0136 9863 0137 

4*8745 0878 3706 

•0001 8765 2468 

1*6313 2883 4531 

74 

•0135 1351 3514 

4-8880 2229 7220 

•0001 8261 5047 

1-6315 1144 9578 

75 

0133 3333 3333 

4-9013 5563 0553 

•0001 7777 7778 

1-6316 8922 7356 

76 

•0131 5789 4737 

4*9145 1352 5290 

•0001 7313 0194 

1-6318 6235 7550+ 

77 

•0129 8701 2987 

4-9275 0053 8277 

•0001 6866 2506 

1-6320 3102 0056 

78 

•0128 2051 2821 

4-9403 2105 1097 

•0001 6436 5549 

1-6321 9538 5605+ 

79 

•0126 5822 7848" 

4*9529 7927 8946 

•0001 6023 0732 

1*6323 5661 6338 

80 

•0125 0000 0000 

4*9654 7927 8946 

•0001 6625 0000 

1 -6325 1186 6338 

81 

•0123 4567 9012 

4-9778 2495 7958 

•0001 5241 5790+ 

1-6326 6428 2128 

82 

•0121 9512 1951 

4-9900 2007 9909 

•0001 4872 0999 

1 -6328 1300 3127 

83 

•0120 4819 2771 

5-0020 6827 2680 

' -0001 4515 8949 

1-6329 5816 2076 

84 

•0119 0476 1905" 

5-0139 7303 4585" 

•0001 4172 3356 

1 -6330 9988 5432 

85 

•0117 6470 5882 

5-0257 3774 0467 

•0001 3840 8304 

1-6332 3829 3736 

86 

•0116 2790 6977 

5*0373 6564 7444 

•0001 3520 8221 

1-6333 7350 1957 

87 

•0114 9425 2874 

5*0488 5990 0318 

•0001 3211 7849 

1-6335 0561 9807 

88 

*0113 6363 6364 

5*0602 2353 6681 

•0001 2913 2231 

1*6336 3475 2038 

89 

•0112 3595 5056 

5-0714 5949 1737 

•0001 2624 6686 

1*6337 6099 8724 

90 

•0111 1111 1111 

5-0825 7060 2849 

*0001 2345 6790 

1*6338 8445 5514 

91 

•0109 8901 0989 

5-0935 5961 3838 

•0001 2075 8363 

1-6340 0521 3877 

92 

•0108 6956 5217 

5-1044 2917 9055" 

•0001 1814 7448 

1-6341 2336 1325" 

93 

•0107 6268 8172 

5-1151 8186 7227 

•0001 1562 0303 

1-6342 3898 1628 

94 

•0106 3829 7872 

5-1268 2016 5099 

*0001 1317 3382 

1 *6343 5216 5009 

95 

•0105 2631 5789 

5-1363 4648 0889 

•0001 1080 3324 

1*6344 6295 8333 

96 

•0104 1666 6667 

5-1467 6314 7555+ 

•0001 0850 6944 

1*6345 7146 6278 

97 

•0103 0927 8351 

5-1570 7242 5906 

•0001 0628 1220 

1*6346 7774 6498 

98 

•0102 0408 1633 

5*1672 7650 7539 

*0001 0412 3282 

1-6347 8186 9780 

99 

•0101 0101 0101 

5-1773 7751 7640 

*0001 0203 0405+ 

1*6348 8390 0185" 

m 

•0100 0000 0000 

5*1873 7751 7640 

*0001 0000 0000 

1-6349 8390 0185" 








208 


Rank - Variates and Rank-Intervals 
TABLE II. 

Table for finding the Derived Beta and Gamma Functions. 


n 

A 

Tetrabeta 

A 

Pentabeta 

l/n» 

S(l /« a ) 

.. 

l/w“ 

Sdln*) 

1 

1-0000 0000 0000 

1 -0000 0000 0000 

1 -0000 0000 0000 

1*0000 0000 0000 

2 

•1250 0000 0000 

1-1250 0000 0000 

*0625 0000 0000 

1*0625 0000 0000 

3 

*0370 3703 7037 

1*1620 3703 7037 

. *0123 4567 9012 

1*0748 4567 9012 

4 

•0156 2500 0000 

1-1776 6203 7037 

•0039 0025 0000 

1-0787 5192 9012 

5 

•0080 0000 0000 

1-1856 6203 7037 

•0016 0000 0000 

1-0803 5192 9012 

- 6 

•0046 2962 9630“ 

1-1902 9166 6667 

•0007 7100 4938 

1*0811 2353 3951 

7 

•0029 1545 1895+ 

1*1932 0711 8562 

•0004 1049 3128 

1 -0815 4002 7078 

8 

•0019 5312 5000 

1*1951 6024 3562 

*0002 4414 0625* 

1-0817 8416 7703 

9 

*0013 7174 2112 

1*1965 3198 5674 

*0001 5241 5790+ 

1*0819 3658 3494 

10 

•0010 0000 0000 

1 1975 3198 5674 

*0001 0000 0000 

1-0820 3658 3494 

11 

•0007 5131 4801 

1 -1982 8330 0475+ 

•0000 0830 1340 

1*0821 0488 4839 

12 

•0005 7870 3704 

1-1988 6200 4179 

•0000 1822 5309 

1-0821 5311 0148 

13 

•0004 5516 6136 

1-1993 1717 0314 

•0000 3501 2780“ 

1*0821 8812 2928 

14 

•0003 6443 1487 

1*1996 8160 1801 

•(XXX) 2603 0820+ 

1 0822 1415 3748 

15 

•0002 9629 6296 

1*1999 7789 8098 

•0000 1975 3080 

1*0822 3390 6835“ 

16 

0002 4414 0625 ^ 

1*2002 2203 8723 

•0000 1525 8789 

1*0822 4916 6624 

17 

0002 0354 1624 

1 -‘2004 2658 0347 

•(XXX) 1197 3037 

1 -0822 6113 8660* 

18 

•0001 7146 7764 

1-2005 9704 8111 

•0000 0962 5987 

1 *0822 7066 4647 

19 

•0001 4579 3847 

1*2007 4284 1958 

•(XXX) 0707 3360+ 

1*0822 7833 8008 

20 

*0001 2500 0000 

1*2008 6781 1958 

•0000 0625 0000 

1*0822 8458 8008 

21 

•0001 0797 970<r 

1-2009 7582 1658 

■0000 0514 1890+ 

1-0822 8972 9898 

22 

•0000 9391 4350+ 

1 *2010 6973 6008 

•0000 0426 8834 

1 -0822 9399 8732 

23 

*0000 8218 9529 

1*2011 5192 5537 

•(XXX) 0357 3458 

1 -0822 9757 2190” 

24 

•0000 7233 7963 

1*2012 2426 3500+ 

•0000 0301 4082 

1-0823 0058 6272 

25 

•0000 6400 0000 

1*2012 8826 3500+ 

•0000 0256 0000 

1-0823 0314 6272 

26 

•0000 5689 5767 

1-2013 4515 9267 

*0000 0218 8299 

1-0823 0533 4570+ 

27 

•0000 5080 5263 

1-2013 9596 4531 

*0000 0188 1676 

1-0823 0721 6247 

28 

•0000 4555 3936 

1*2014 4151 8467 

•0000 0162 6926 

1-0823 0884 3173 

29 

•0000 4100 2091 

1-2014 8252 0558 

•0000 0141 3865** 

1 -0823 1025 7038 

30 

•0000 3703 7037 

1*2015 1955 7595“ 

•0000 0123 4568 

1-0823 1149 1606 

31 

•0000 3356 7185- 

1*2015 5312 4779 

•0(XX) 0108 2812 

1 -0823 1257 4419 

32 

•0000 3051 7578 

1*2015 8364 2358 

•0000 0095 3674 

1-0823 1352 8093 

33 

•0000 2782 6474 

1-2016 1146 8832 

*0000 0084 3226 

1-0823 1437 1319 

34 

•0000 2544 2703 

1-2016 3691 1535- 

*0000 0074 8315- 

1-0823 1511 9634 

35 

•0000 2332 3615+ 

1-2016 6023 5150- 

•0000 0066 6389 

1-0823 1578 6023 

36 

•0000 2143 3471 

1*2016 8166 8620 

•(XXX) 0059 5374 

1-0823 1638 1397 

37 

•0000 1974 2167 

1*2017 0141 0788 

•0000 0053 3572 

1-0823 1691 4970“ 

38 

•0000 1822 4231 

1-2017 1963 5019 

•0000 0047 9585 

1-0823 1739 4555” 

39 

•0000 1685 8005+ 

1-2017 3649 3024 

•0000 0043 2257 

1-0823 1782 6811 

40 

•0000 1562 5000 

1*2017 5211 8024 

•0000 0039 0625 « 

1 -0823 1821 7436 

41 

•0000 1450 9366 

1*2017 6662 7389 

•0000 0035 3887 

1-0823 1857 1323 

42 

*0000 1349 7462 

1-2017 8012 4852 

•0000 0032 1368 

1 -0823 1889 2691 

43 

•0000 1257 7509 

1*2017 9270 2361 

•0000 0029 2500+ 

1 -0823 1918 5191 

44 

•0000 1173 9294 

1*2018 0444 1655“ 

•0000 0026 6802 

1 0823 1945 1994 

45 

•0000 1097 3937 

1-2018 1541 5592 

•0000 0024 3865+ 

1 0823 1969 5859 

46 

•0000 1027 3691 

1-2018 2568 9283 

•0000 0022 3341 

1-0823 1991 9200“ 

47 

•0000 0963 1777 

1*2018 3532 1060 

•0000 0020 4931 

1-0823 2012 4131 

48 

•0000 0904 2245+ 

1 *2018 4436 3305* 

•0000 0018 8380 

1-0823 2031 2611 

49 

•0000 0849 9860" 

1*2018 5286 3165“ 

•0000 0017 34#7 

1-0823 2048 5878 

50 

•0000 0800 0000 

1*2018 6086 3165“ 

•0000 0016 0000 

1-0823 2064 5978 




Karl Pearson 
TABLE II {continued). 


209 



A 

Tetrabeta 

A 

Pentabeta 

n 

1/n 8 

S (1/n 8 ) 

1/n * 

8 (1 /n 4 ) 

51 

•0000 0763 8679 

1*2018 6840 1744 

•0000 0014 7816+ 

1*0823 2079 3793 

52 

•(XXX) 0711 1971 

1 *2018 7651 3714 

•0000 0013 6769 

1-0823 2093 0562 

53 

•0000 0671 6954 

1-2018 8223 0669 

•0000 0012 6735- 

1*0823 2105 7297 

54 

•0000 0635 0658 

1-2018 8858 1327 

•0000 0011 7605“ 

1-0823 2117 4902 

55 

0000 0601 0518 

1-2018 9459 1845+ 

•0000 0010 9282 

1-0823 2128 4184 

56 

•0000 0569 4242 

1-2019 0028 6087 

•0000 0010 1683 

1*0823 2138 6867 

57 

•0000 0539 97 l #2 

1*2019 0568 6859 

•0000 0009 4733 

1-0823 2148 0600- 

58 

•0000 0612 5261 

1-2019 1081 1121 

•0000 0008 8367 

1-0823 2156 8966 

59 

•0000 0486 9047 

1-2019 1568 0168 

0000 0008 2526 

1 0823 2166 1492 

60 

•0000 0462 9630*“ 

1-2019 2030 9797 

•0000 0007 7160+ 

1 0823 2172 8653 

61 

•0000 0440 5655 + 

1-2019 2471 5452 

•0000 0007 2224 

1-0823 2180 0877 

62 

•0000 0419 5898 

1 -2019 2891 1350+ 

•0000 0006 7676 

1-0823 2186 8553 

63 

•0000 0399 9248 

1-2019 3291 0599 

•0000 0006 3480+ 

1 *0823 2193 2033 

64 

•0000 0381 4697 

1*2019 3672 5296 

0000 0005 9605- 

1 0823 2199 1637 

65 

•0000 0364 1329 

1-2019 4036 6625- 

•0000 0005 6020+ 

1-0823 2204 7658 

66 

•0000 0347 8309 

1-2019 4384 4934 

•0000 0005 2702 

1*0823 2210 0359 

67 

•0000 0332 4877 

1-2019 4716 9811 

•0000 0004 9625+ 

1-0823 2214 9984 

68 

•0000 0318 0338 

1-2019 5035 0149 

•0000 0004 6770“ 

1-0823 2219 6754 

69 

0000 0304 4057 

1-2019 5339 4206 

0000 0004 4117 

1-0823 2224 0871 

70 

•0000 0291 5452 

1-2019 6630 9658 

0000 0004 1649 

1-0823 2228 2520+ 

71 

•0000 0279 3991 

1*2019 5910 3648 

•0000 0003 9352 

1*0823 2232 1872 

72 

•0000 0267 9184 

1-2019 6178 2832 

•0000 0003 7211 

1*0823 2235 9083 

73 

•0000 0257 0582 

1-2019 6435 3414 

•0000 0003 5213 

1-0823 2239 4297 

74 

•0000 0246 7771 

1-2019 6682 1185- 

•0000 0003 3348 

1*0823 2242 7645“ 

75 

•0000 0237 0370 

1-2019 6919 1556+ 

•0000 0003 1605" 

1*0823 2245 9250“ 

76 

•0000 0227 8029 

1-2019 7146 9584 

•0000 0002 9974 

1 *0823 2248 9224 

77 

•0000 0219 0422 

1-2019 7366 0006 

•0000 0002 8447 

1 0823 2251 7671 

78 

•0000 0210 7251 

1-2019 7576 7257 

•0000 0002 7016 

1-0823 2254 4687 

79 

•0000 0202 8237 

1*2019 7779 5494 

•0000 0002 5674 

1*0823 2257 0361 

80 

•0000 0195 3125* 

1 -2019 7974 8619 

•0000 0002 4414 

1 *0823 2259 4775~ 

81 

•0000 0188 1676 

1-2019 8163 0295+ 

0000 0002 3231 

1-0823 2261 8005+ 

82 

•0000 0181 3671 

1-2019 8344 3966 

•0000 0002 2118 

1*0823 2264 0123 

83 

•0000 0174 8903 

1-2019 8519 2869 

"*0000 0002 1071 

1-0823 2266 1194 

84 

•(XXX) 0168 7183 

1-2019 8688 0052 

•0000 0002 0086 

1-0823 2268 1280~ 

85 

0000 0162 8333 

1*2019 8850 8385- 

•0000 0001 9157 

1-0823 2270 0437 

86 

•0000 0157 2189 

1-2019 9008 0573 

0000 0001 8281 

1*0823 2271 8718 

87 

•0000 0151 8596 

1*2019 9159 9169 

•0000 0001 7455+ 

1*0823 2273 6173 

88 

•0000 0146 7412 

1*2019 9306 6581 

•0000 0001 6675+ 

1*0823 2275 2848 

89 

•0000 0141 8502 

1*2019 9448 5083 

•0000 0001 5938 

1-0823 2276 8787 

90 

•0000 0137 1742 

1 -2019 9585 6825+ 

0000 0001 5242 

1-0823 2278 4028 

91 

0000 0132 7015- 

1-2019 9718 3840 

•0000 0001 4583 

1*0823 2279 8611 

92 

•0000 0128 4211 

1-2019 9846 8052 

•0000 0001 3959 

1-0823 2281 2570- 

93 

•0000 0124 3229 

1-2019 9971 1281 

•0000 0001 3368 

1*0823 2282 5938 

94 

•0000 0120 3972 j 

1*2020 0091 5253 

*0000 0001 2808 

1*0823 2283 8746 

95 

•0000 0116 6351 

1-2020 0208 1604 

•0000 0001 2277 

1*0823 2285 1023 

96 

•0000 0113 0281 

1-2020 0321 1884 

•0000 0001 1774 

1 -0823 2286 2797 

97 

•0000 0109 5683 ' 

1-2020 0430 7567 

•0000 0001 1296 

1*0823 2287 4093 

98 

•0000 0106 2482 i 

1 *2020 0537 0050- 

•0000 0001 0842 

1*0823 2288 4934 

99 

•0000 0103 0610+ 

1*2020 0640 0660 

•0000 0001 0410 

1*0823 2289 5344 

100 

•0000 0100 0000 

1*2020 0740 0660 

•0000 0001 0000 

1-0823 2290 5344 


Biometrika xxiv 


14 




210 


Rank- Variates and Rank-Intervals 


(3) Moments of the Rank-Variates in Samples from the Exponential Curve. 
The equation to the exponential curve being 


N -i 
* — e <* 

O’ 


■<i). 


if we measure the abscissae in terms of o- as unit, and take a x to be the proportional 
area from the ordinate y at x to the tail, or 

ydx/N^er* .(ii), 

J X 

we have x = - log a x . 

Then the &th moment coefficient pf of the character of the individual in the 
jth rank about the origin x = 0 is given by 

'■* “ / 0 < 1 - “. J 1 " 1 

- b < /„‘ (1 (l °« “■>* 


( ^ d t f (l-a x )i- 1 a x "-vda x 
J 0 

B(q, n-q + 1) . 


.(iii). 


B (q, n-q + l)dn k J 0 

_ _ ( -iy* 

B(g, n — q + lj dn k 

We will write % (q, n) for this B-function or where there is no danger of 
confusion simply %. The next task is to transfer these moment coefficients to the 
mean by the aid of the well-known formula 


A H - kp'k-i pi + 
From this we easily deduce by (iii) that 


k(k- 1) , , 2 

— jj — /iMMi '-•••• 



(-1 V(d ,)* 

%~\dn +H ) * . 

.(iv). 

Now tho operator 

d 

4- pi 2 applied to % gives the result 


Hence 

*~C'-£ f +*0-. 

.(V). 

Thus the mean character n x q in the qth, rank in samples of n from an exponential 

curve is given, 

, d log 

dn . 

.(vi), 

or reinserting <r, 

(\ , 1 1 \ 
g \n n — 1 n — q+ 1' 


Again, 















Karl Prarson 


211 


The squared factor vanishes by (v) and we have 


. ePlogt 

isr 


.(vii). 


or reinserting <r* n cr* -<r«+... + (~+Yp ) . 

Proceeding to the third moment by applying the operator ^ + to once more, 
we have 

or reinserting <r*, /us = — a 8 .(viii) 




Again, 


\n 8 + (n — l) 8 + *' * * (n — q + 1)* 


^ .(viii biB ). 




or 

or reinserting cr 4 , 


Similarly, 
to' 


cl* log t /#logt\ a , ir) 

^--~d~n~ + 3 {~d^~) . (1X) ’ 

/it a ^ (w 4 + (»-1)* + (n—g + l) 4 ) 

+3 (i + <dh? + ■ • • + <*4+1?)* J. (ixW,) - 



d B log-£\ 

dn» 

dn » T 

dn* ) .. 

120 + 

1 

X 

, 1 \(\ + . 

L \w* ( n 

-1)* * " 

(n - q + If) \« 8 T (n 


+ 241 

(n 6 + (n-l) 5 + "‘ + (n- 


-g+i) 8 )] 




d 


Still higher moments can be found readily by repeated use of the operator 
4* f*\- It is clear that such moment coefficients are all expressible in terms of 

the hyperbeta functions and these in terms of the series of the inverse powers of 
integer numbers. 

I have succeeded in finding the analytical form of the frequency distribution 
of the variate of the individual in the gth rank; this will be given later. Since each 
additional moment involves the sum of a series of one degree higher inverse powers, 

14—2 











212 


Rank- Variates and Rank-Intervals 


it was most unlikely that a curve with a linear relation between successive moments 
would theoretically describe this frequency; it would have involved a relation 
between series of inverse powers of a very improbable character. In actual practice, 
however, I find that we get an adequate representation of the frequency by using 
a Pearson curve found from n cr q , and the corresponding A and A- These latter are 

n w* + + '" + (n-q + lf} ... 

A- »Ol + _L + + i . •)* . ( 

\n* (n—1)* (w —(/ + !)*/ 


*-$- 3+ 


. uii) . 

\7? a + (n - i) a + + (n — q 4- 1)*/ 


A particular case is easy. Suppose q to represent the first individual. Then 


n <r Q = <r/n, A = 4, A » 9, , etc., 

n 

but these are the constants belonging to an exponential curve of standard deviation 
a In. This result has already been obtained by E. S. Pearson and J. Neyman*. 

Let us now consider the last individual in a sample of 100, or g = 100; then we 
easily obtain, from our Tables I and II, 

100^100 = x 5*187,3775 + , 


ioo^ioo — a x 1*278,665“, 


A- T32231, /9 2 = 5*59593. 


Let us now consider an individual near the median, q = 50, say, in a sample of 
100. Here 

ioo^m = 0* (5*187,3775 - 4*499,2053)« <r x *688,1722, 
ioo*6o 2 - (T 2 (1*634,9839 - 1*625,1327) = <r a ( 009,8512), 
and 1000*50 — 0* x *992,530, 

fit -*089,845, A- 3*139,689. 

Remembering that for the first rank 

A*4 and A*9, 

we see that even for a sample of 100 the distribution of extreme ranks differs 
much from normality, the last ranks being more nearly normal than the early 
ranks, while ranks in the neighbourhood of the median are approaching normality 
but still differ from it both in skewness and kurtosis. It is clear that the approach 
to a normal distribution depends not only on the size of the sample, but very 
largely on the rank of the individual in the sample. It would be an interesting 
task to trace for different values of q the family of curves A> A as w is varied. 
Each member of the family appears to run through several Pearson-types. 


* Biometrika, Vol, xx A , p. 223. The reader will find in that paper another method of treating 
samples from the exponential carve (pp. 221—30). 






Karl Pjuarson 


218 


(4) Product Moments of Rank-Variates in Samples from the Exponential Curve. 

We next turn to the mean product of the variates of two ranked characters 
x„, x„’ in a sample of size n. If as before we measure the area ratio from x to a we 
have 

flJa-logCt*, 

and 

[x*xf\ = (-l)» +, x T da x 'a x ' n -* (log ajj f da x (1 - a ,)*- 1 (a* - a x 'f-*~ l (log a*)* 
Jo J«*' 


where >x Q and aj < a Xf and 

T(n+ l)<x' + * 

* = r (q) r (q' - q) T(n - <f + 1 )' 


We have 


where 


U q'-Q~l 


W1 = (- X 1 . 

,_!=[’ da x (1 - a ,)*- 1 (a x - a *')*'-*- 1 (log a*)*. 
JW 


Now integrate by parts and note that a *' « 0 at the lower limit and u q ^i = 0 at 
the upper limit. Hence 

K‘V) Vx * (irv +1 <-D^). 


du Q - q -1 


r--c-(i-<«* - o '-*- 1 (log «*)*]. 


- ( 9 ' - 9 - 1 ) [ 1 da* (1 - a*)*- 1 (a* - (log a x )\ 


and the term between square brackets vanishing, 


<**'*•'>£ d "- a - 




Now this process may be repeated until the whole power of a* —a* is transferred 
to aj and, when this is accomplished, we have 

ix i x *1 — (— l)*+< Y — ( (g "" g 1) (g JT ^LZ - 

1 « * *dn‘V(™-g' + l)(7i--g' + 2j...n-g-l 

x £ dajaj*-*- 1 j 1 da x (1 - a*)*- 1 (log a*)*).(xiv). 

The double integral may be expressed as 

Jo ^ d( ^-q ) = [ (ai "‘~ e “ o) o " Jo <?«7 *'* ] ’ 

where the part outside the integration vanishes and the integral 

[ a J n ~ q pn da x = - [ a*'"-® (1 - a*' )® -1 (log a,') 1 da x ' 

Jo <t<*x Jo 


= “ST « B(?,w_9 + 1) - 





214 


Rank- Variates and Rank-Intervals 


We can accordingly write (xiv) in the form 




r («+1) 


T(q' -q)T(q) r(n — q' +1) dn* 

(^T *£■<*•-«♦>)) 


= (-l)*+‘<T*+‘, 


T(n — q + 1) 

1 d* 


B(q' — q, n — q' + 1) B (g, n-q + 1) dn‘ 

x (B(g'-g, n-g' + l) J^ t B(g, n-q + l)j ...(xv).' 

The product moments of the various orders are taken about the origin of x y and 
must be reduced to the mean by the usual process. The formula (xv) contains all 
the results previously reached. If we write 8 » 0, t = k t we have 

r / to (-l)*o-* d*B(o, n-o + 1) 0 A „ 

{ ^ }= B(g ,n- g + l) -d^“ ' 866 P- 210 > ^ (m) - 

If we put t = 0, s = k, we obtain, after some slight reductions, 

r/« d k B(q’, n-q’ + l) 

^ * ' B (q', n - q' + 1) dn* " ' 

If we take s = 1, t = 1, we have 


to«v} = «■* 


(dB(q-j^n~q + 1 ) 


dn 


® (g* ~ g> n — g' +1) B (g, n — g + 1) \ 

X «- ttA> 4 B ,' + 1) ’ +1 >) 


= ff 8 n-q' + 1) dlogB (q , n-q+1) 

\ dn dn 


Now 


B(q,n-q + 1) 


d*B (q, n-q + 1)' 
dn* 


)■ 


dlogB(g' — q, n — q' + 1) _ r'( n-g' + l) T 1 ' (n — q + 1) 

dn ~ f(n —g' + 1) I s (n — g+j) 

_r'(n-g' + l) T'(n + 1) /r'(n-g + l) r'(n + l)\ 

T(m —g' + l) r(n + l) \r'(»-g + l) r(n + 1 )) 

— - '°g 1 (g*> n ~ g ' +1) _ d log B (g, n — q + 1) 
dn dn 


Thus 

or 


a a ’ 


{#«av} = x„xj — x* + 


_1_ cPB(q, n — q+ l) 

B(q, n-q + 1) dn* ’ 


= O’v 


and accordingly 




{x„x^ - X'Xf ff, 

(Tq <Ttf GT q < 


a very simple formula for the correlation coefficient of x g and « 4 -, q' > q. 


...(xvi), 

,(xvi bto ). 







Karl Pearson 


216 


Now consider the interval between the 9 th and g'th ranks = a 9 - — tc„ with a 
mean value fa-fa'. Then 

ffV-* ” K(«V - fa) ~ (fa - fa)?} 

-uV-ff*, .(xvii). 

Let us next investigate the correlation coefficient between the intervals <r„> — x p 
with ay —ay, 

q' >q>p' >p. 

The mean product is 

{((«„. - x P ) - fa - fa)) ((a;,- -x,)- (fa - fa))} 

= {(fa ~ fa) ~ (** “ **)) ((«V “ *«') ~ fa ~ fa))} 

* {(<V ~ fa) fa' ■' fa) + Or ~ ®») fa ~ fa) ~ fa ~ fa) fa' ~ fa) ~ fa ~ fa') fa ~ fa)} 

= Gyflyf’ayflty/ + (J ‘v or d r xpXq < r p ar q' r xpXq’ Cp'Gql'wp'mq 

* <r 2 „' + <t 8 p - <7% - <rV = 0, 

or, the correlation coefficient between any two non-covering intervals is zero. 

Since 

x q' “ x q " ~ a V-l) + (#*'-1 — x q»-%) + (^V-a — x q'~ 3) + •.. + (#0+1 - #„), 

and 

% — # a = (^v - ^- 1 ) + ( x q;-i — %-«) + (%- 2 ~^d'-8)+ ... + (^ + i — #<,), 

it follows by subtracting, squaring and taking mean values that 

t^Xq'-Wq ~ a *Xq'-%-Xq'„t + • •• + ^a^+i-oty 

1 1 1 1 
(n-- 9 ' + l)^ + (n^5 > +2) 8 + (n^ 9 / + 3)? l+, ‘* + (n--( ? )* 

— <j\ — a 8 ,, as before. 

We pass now to higher product moments of the gth and ^'th ranks* and to the 
higher moments of the interrank interval. For brevity we shall write 

+ and B(q, n-q + l) = % 

while the corresponding hyperbeta functions will be 

t' = ^ lo g B t = J^logB(9,n-g + l), 

and so on. 


We note that 



1 


d B ( q } n- 

?+ 

I) 

B(q, 

n — q + 

1) 

dn 




1 


d*B (q, n- 

■?+ 

D 

B(q, 

n-q + 

1) 

dn 8 . 




1 


d 3 B (q, n - 

g+ 

1) 

Bfa 

n-q + 

T) 

dn * 




1 


d*B(g, to- 

JL± 

I) 

B (q, 

n-q + 

i) 

dn 4 




= t .(xviii), 

-t + ? .(*ix), 

= t + 3tt + t s .(xx), 


= t + 3^ + 4^t + 6 f < + t* -Oxi), 


* We suppose throughout q'>q , i.e. aty>ar g . 







216 


Rank- Variates and Rank-Intervals 


while dropping the affix », we have for the 9 th rank 

f**** 4 ^ + 3$*) -• .(xxii). 

Starting from (xiv) we can write it in the simpler form below by expanding the 
B-functions in T-functions 

.<“«)• 

Here by aid of equations (xv) to (xviii), or still further such equations, it is fairly 
easy to obtain any required product moment about the start of the exponential 
curve as origin. 


The third order products are 

(i) t = l, s — 2 : 

and thus by (xviii) and (xix), 

*•(*< + *?•+art+ t) . 

or by aid of (xix), 

{x q a? Q ) = x q g\ + x Q x* Q ' + 2x a ' a* Q + (?) . 

(ii) t = 2 , 8 = 1 : 

--<4 i (T*£)—•£ 5 <f « + V» 

— ”'(yf« + « + t + 2 «) 

= -«T 8 (t'^ + tt 4 +2t$+^) . 

or, we may put it in the form 

{x?gX„} = of, Q . a\ + XgX*, + 2 x v cr\ + n»(q) . 

We are now in a position to find 

{( x a’ ~ a.)*} = {a 4 .} — 3 {x\x^} + 3 


...(xxiv), 

,(xxiv bl "). 


...(xxv), 

,(xxv bl '). 


= fig' ( 9 ') + 3 iCj a\ — Saiga*, + 3 x^ a — Sx\x t 

+ 6 x,a\- 6 x 40 *,-fi 9 ’ (q) .(xxvi), 

where fig (</), fig ( q'), fi s ' ( 9 ), fig' ( 9 ') denote respectively the third moment coefficients 
of the 9 th and 9 'th ranks respectively about their means and about the start of the 
exponential curve. 

It is now easy to obtain the third moment coefficient of the interval between 
the 9 th and 9 'th ranked variates about its mean x^ - x Q . 


fig (x v .- x q ) = /x 9 , say, 

= {{X q .-X Q -(Xg.-X,)f} 

= {(«,< ~ «,) 3 } - 3 {(Xg. - Xgfi (Xg. - Xg) + 2 (Xg. - Xgf 

- {(<*V ~ «.)*} - 3 {C^' ~ “ (*«- - *»))*} (*«- - *.) ~ (®.- - *„)* 

= {(x Q . - Xg) 3 } - -Vs («'-,) («V ~ *«) - - •*«)* 









Karl Pearson 217 

Now substitute from (xxvi) and (xvii), and we have 

<*»(»<-«) = Ha (?') + 3®,.<r*, - &u„a\, + 3x t ,x\ - 3& q .x v + 6w t a\ - 6x t 'A 
~ ( q ) - 3 (<tV - A) (® r - «*) - ®V + 3 & r x Q - 3S,,**, + 5 * 

= H* (?') - 3*„. o\ - *V - (m*' (?) - 35?, a\ - «*) 

* (?') - 3Sv,u 2 ' (?') + a® 8 ,. - ( Ms ' (?) - (?) + 2a* ) 

*^«(?')-/W8(?). 

Tk»» S !g£) «»<©- ©..(«rii). 

From this wo deduce at once 

(V ~ 9) ~ _ ^3.(xxvii b "). 

vievnof deterinindng 5 ^^?' 5 — ?^ “ "** ^ “ - fficiente - 

(iii) < = l,s = 3: 

- +w+ f) t+m'+ t‘>t+art t ©.(„»ai), 

= Ws'K + %AAZ<, + AA + 3<rV<r a , + 3* a ,. a*, + 35,^ s (?)+/*«(?)-3 <r 4 ,) 

(iv) .(xxviii b ‘->. 

-4 di<f(t+nt*v» 

= * 4 (*' (* + 3 %% + t>) + t + SP + 3f$ + srt) .(xxix), 

01 = *<i'Ma (?) + 3*,-a>, o* + a,,,? 3 , + ^ (?) - 3a 4 , + 3<r*„ + 3®,,u* (?) + 3.r», ff 2 , 

(v) < = 2, * = 2: .(xxixW). 

{<A-i = * 4 £ ^ (t f jj) =| ^ f (t + f) 

= ff * t ? (SM + ** } + 2 S' + 2 «> + ? Of + 2t*+ 2^)) 

= "W + D (t + ?) + 2? (t + 2tt) + t + 2V + Ht) .(xxx), 

or « «Vr + a*,**,,, + **,.5*, + A A- + (?) + ^ r x tt o\ + H (?) _ 3<r 4 , 

+ 2a\ 4- 2x q nz(q) .(xxx bl§ ). 











218 


Rank- Variates and Rank-Intervals 


By aid of (xxviii w, ) 1 (xxix 14 *) and (xxx bto ) we find 

= (re 4 ,.) - 4 f® 2 ,.®,) + 6 {«*„-a^„) - 4 {*„,«*„) + f® 4 ,} 

= fit' (q') - 4 nt(q')x q - 12® a ® a ,o- 2 a , - 4 & r x t —120*,*^,. — 12®V<r* a 
- 12® a ,/u» (?) — 4^| ( q ) + 12<r 4 a + + 6®*,.®-*, + 6®*,^ + 6®*,®*,, 

+ 12® a ,/t8 (?) + 24® a ,® a <r s a + Gfi t (?) — 6<r 4 a + 12® a /i* (3) — 45^/x* (?) 

-12®,,®, <r* a — 4®,,®*, - 4^*4 (?) -12® a p t (?) -12®* a (7 s , + fit (?). 

Hence remembering that fit' => fit + 4>/i»x + 6/xa® 2 + S 4 we have, on rearranging, 

M«' = {(^r - *a) 4 ) = Ms (?') - 3<r 4 a . - (fit (?) - 3«r 4 a ) + 3 (*\ - cr\f 

+ 4 (fit (3') - fi» (?)) (®„, - ® a ) + 6 (<tV - <r* a ) (*<r - ® a )* + (% - ® a ) 4 .. .(xxxi). 

Now 

Ms (q'-a) = {(*«' ~ (^e ^a))*] 

= {(® a , - ® a ) 4 } - 4 {(® a , - * a ) 8 ) (® a - ® a ) + 6 {(® a - - ® a )*| (® a . - ® a )* - 3 (® a , - ® a ) 4 
= |(® a , - ® a ) 4 } - 4 [(fi s (q')-fis(q)) (® a - ® a ) + 3 (a 2 ,, - a 2 ,) (® a .- ® a ) a +(® a .-® a ) 4 ] 
+ 6 [(<r* a - - a 2 a ) (Xq> - x a f + (® a , - ® a ) 4 ] - 3 (® a . - ® a ) 4 

= |(«V ~ *a) 4 ) - 4 (Ms (?') ~ Ms (?)) (®«r - ®a) ~ 6 (®V ~ «*o) (»a- “ **)* ~ (®a--®a) 4 - 
Hence substituting from (xxxi) we find 
Ms (a'-e) = Ms(?') - 3<r 4 a . - (fit (?') - 3<r 4 a ) + 3 (cr 2 ,. - er 2 ,) 2 

= «r 4 (t-^ + 3(t'-t) 2 ).(xxxii). 


(5) Interrank Intervals. 

Equations (xvii), (xxvii) and (xxxii) provide us with the chief constants* of 
the distribution of the interrank interval between ranks 3 and 3', given 3 ' > 3 and 
> ® a . We may state them as follows, if v, a = ® a , - ® a : 


V, =<r(t-t') 


( 1 1 1 \ 

\w — q n-q-1 n-q' + l) 


*? =o 2 (t'-t) 


~ 32 (.(n - qf + (n - q - If 1 (m-^' + J) 2 ) 

.(xxxiv), 

At-a'-mt -tf 


\(n- 3 )* (n-?-l)* H ’ (n — 3 ' +l) 2 / 

( 1 . 1 , , 1 V “ 


((«-?)* f (n-?-l)* + («-?' + If) 



# It would be worth investigating whether the #th semi-invariant is not the difference of the q 'th 
and gth hyperbeta function of order ixir 1 . 

t In order to obtain the sign of •J we must remember that 


and this is essentially positive. 







Karl Pearson 


219 


<Bt -3 + ( |'-!j» 


! 3 + 


6(—L- + . -L_ 

\(n-q)* (n-q- 


g -l )4+- + (tt _g'.H)« 


((l^i) s + (n-9-l)» + - + (n-g' + l)*) 


y ...(xxxvi). 


(n-q' + l)* 

It is clear that the sums of these inverse powers can be obtained at once from 
Tables I and II by taking the difference of two sums. 

If we take the qth interval, i,e. the interval between the qth and (g + l)th 
ranks we have 

VfM 




a!{n - q) 

^”*<r % l(n-q? y or <r^ +1(f « <rl(n- q) 


.(xxxvii), 


ifii — 4 , ift | = 9 

or tho distribution of the yth interval in samples of n from an exponential curve 
is another exponential curve of standard deviation er/(n — q)*. 

As illustrations of the above theory let us consider samples of 11 and 51 drawn 
from an exponential curve, and consider the constants of the distribution of the range 

in the two cases. Here q * 1, and q' ** n, and accordingly we require S , 

or, |Q*+ \ + ... -I- p in the first case and in the second case ~ 4-... -f 

These are given at once by our Tables. 


Distribution of Range R in Samples of 11 and 51. 


n= 11 

R = *x 2*9289,6825 
<r 2 R = a a x 1*5497,6773 
a R ~ a x 1*244,897 
_ 4 (1*1975,3199) 2 
*** (1*5497,6773)* 

= 1*54111 

^ 0 , 6(1*0820,3658) 

d+ (1-5497,6773)* 
= 5-70308 


n- 51 

- <r x 4-4992,0534 

- <r* x 1-6251,3273 
= <rx 1-274,807 

_ 4 (1-2018,6086)* 

“ (1-6251,3273)* 

= 1-34618 

6(1^0823^2065) 
* 6 + (1-6251,3273^ 
= 5-45884 


Both these distributions show very sensible skewness and marked leptokurtosis. 
They indicate a curve of Pearson’s Type VI f, i.e. y = y 0 (^*“«) Wl /^ ma fr° m ^ = a 
x a# oo. The two curves are the following: 

\ 3-81602 // w \ 80*08094 

UJ 


«11: y - ^ 1-378,451 x 10 s * ^ -12-294,788)* 


* Putting qrsrO, we obtain the result for the distribution of x lt the 44 first *’ distance. Bee Biometrika , 
Vol. xxA p. 228. 

t Kg=2-04948 and 1-58909 for samples of 11 and 51 respectively. For Type VI, k 2 must be >1 and 
< oo, and for Type V, *„= 1. Hence the ourves are Type VI, approximating to Type V. 




220 


Rank - Variates and Rank-Intervals 


the origin being at 14*738,037 a to the left of the mean; 

N //r \6MM5 / /^s. 29*56889 

n = 51: 5502,329 xlO* 7 P- 8*422,628J /gj 

the origin being at 11*327,031(7 to the left of the mean. 

Figs. I and II indicate the characteristics of these curves for N = 1000. 

The true forms of the Rango Frequency Curves are given in Section (9) (ii) 
below; their ordinates are indicated by black circular dots on Figs. I and II, and 
the fit with Pearson curves calculated from the first four moments is remarkably 
close notwithstanding the difference in analytical form. 

The values of the skewness (= (mean — mode)/standard deviation) are respectively 
*5261 and *4803, indicating only an 8 °/ 0 reduction in skewness for nearly a five-fold 
increase in size of sample. Even for a sample of 100 we have 

R 0i = 1*32231, R B 2 = 5*42928, giving skewness = *47456, 
and for n-+* cc , 

£$1 = 1*29857, ^$2 = 5*40000, and skewness = *42876. 

Thus even with very large samples, the frequency curve of ranges remains ex¬ 
tremely skew, and differs widely from the normal curve. 

The sample of 51 is almost as far from a normal distribution as the sample 
of 11 . Even if we consider n-+ oo, we have 

a R z=cr x 1282,511, and as before 1*29857, £& = 5*40000, 
for the constants of the limiting distribution. It will be seen that the standard 
deviation is finite, that the skewness and leptokurtosis are still considerable, but 
not very different from those of n = 51, but the mean of this finite curve has shifted 
to infinity since 

B /I . 1 1\ 

R = a - H-- + ... + - = oo * as n-*cc. 

\n n — 1 1/ 

In this case k 2 = *97511, actually corresponding to a Type IV Curve, but so close 
to unity that a Type V would probably fit as well. 

(6) Median and Interquartile Distances . 

Let us now turn to the distribution of the median in samples of 4?ra + 3. Here 
we have </ = 2m+ 2 and n — q = 2 r/i + 1 . We give the values for 3, 11, 25, 51, 75 
and 99 for comparison. We find from (vi bis ) and our Tables: 


Size of Sample 

Median 

3 

11 

23 

r»i 

75 

99 

999 

Asymptotic value oo 

<r X ‘833,333 
<r x ‘736,644 
<rX ‘714,414 
cr x *702,865+ 
a x *699,769 
o-x‘698,172 
cr x ‘693,648 
<rX ‘693,147 


The last value is found from the formula on p. 205, and = 2*3025,8509 logio 2 . 

* In an infinitely great sample one individual or more from a non-limited eurve like the exponential 
would have an infinite value, and thus the range and accordingly the mean range .would be infinite. 




Karl Pearson 


221 





222 


Rank- Variates and Rank-Intervals 


It is clear that the larger the sample the more we depart from the mean of the 
parent population (x = <r) and approach nearer and nearer to the asymptotic value. 
It is possible, however, to determine <r and therefore the mean of the parent 
population from the median by dividing by the appropriate factor. It does not 
however follow that the median is the best single rank to find the mean from; 
that depends on the standard error of the rank. We will now consider the standard 
deviations of the median distributions. 


Size of Sample 

Standard Deviation 
of Median 

Standard Deviation of 
Mean as found from 
Median 

3 

11 

23 

51 

75 

99 

999 

Asymptotic value oo * 

(r x '600,925 
<r x -307,280 
o-X'210,606 6 
tr x '140,690 
<rx-115,846 
o- x '100,753 
cr x 031,646“ 

<r x -000,000 

(<r/Jn)x 1-24900 
(<r/V«)x 1-38367 
(<r/V»)x 1-41379 
(<r/Vn)x 1-42949 
(<rlJn)x 1-43370 
(<r/,*/n) x 1 -43586 
(cr/Jn) x 1*44201 
(<r/V») x 1 -44270 


These values show that relatively to the usual method of finding the mean with 
a standard deviation of ajs/n the determination of the mean from the median is 
25 % to 44 % less accurate, and the inaccuracy increases as we increase the size of 
the sample, although of course the absolute error, owing to the factor 1 jy/n, 
decreases. If we ask whether the mean is better found from the range or the 
median, we can get a partial answer from the data on p. 219 for the distribution of 
range in samples of 11 and 51. We have for standard deviation of mean from range 


a Vll x 1*244,897 
Vn ~ 2*9289,1825 



1*40966, for samples of 11, 


<r V51 x 1*274,807 
\jn 4*4992,0534 


~ tJn X ^ for 8am P^ es °f 51. 


Comparing these with the values (<rj<Jri) x 1*38367 and (<r jsjn) x 1*42949, we see 
that the mean is better determined from the median than from the range, and 
the superiority of the former becomes greater as the sample increases. 


We now turn to the curves of distribution of the median. These will be 
practically completed, if in addition to the information already provided we 
determine fa and fa. 


* This is readily obtained from the second formula on p. 205. We have 

1 i. jl 

(n- 


si-S.- v 

» a (n - q )* 


1 

= -» + 


12 1 

+ etc.= - - + -+etc. = -+etc. 
n n n 


.\ asymptotic value of s.i>. of median= ~, and of means— . 

tjn * 090,147 





Karl Pearson 

fa and fa for Distribution of Medim in Samples. 


223 


Siae of Sample 

Value of ft 

Value of ft 

3 

2*23032 

6-62352 

11 

•75693 

4-17974 

23 

•37720 

3-68766“ 

51 

•17359 

3-27026 

75 

•11867 

3-18470 

99 

•08654 

3 14087 

999 

•00896 

3*01401 

Asymptotic value to-#*oo 

9/to 

3 4 14/to 

TO«#00 

0 

3 


Were we dealing* with the mean and not the median of a sample of n from an 
exponential curve, we should have the distribution curve of means with a fa, fa 
given in terms of Bi, fa of the parent population by 

fa = B i/n = 4/w, fa-3- (Bj — 3 )/n «= 6/w. 

Thus while both median and mean tend as n increases to be distributed normally, 
they do not appear to approach that state by the same route. 

It may not be without interest to determine the degree of accuracy with which 
the mean can be found from the quartiles, confining our attention to the case of 
n = 4 m + 3. 


Size of Sample 

Interquartile Distance 

Standard Deviation 
of Interquartile 
Distance 

Standard Error of 
Mean found from 
Interquartile Distance 

3 

11 

23 

51 

75 

99 

999 

Asymptotic * 

00 

1*500,000cr 

1 *217,857 or 
l*156,219o* 

1 *124,691 o* 
1-116,361 cr 

1-112,064 cr 

1 *101,152 o* 

(1 *098,612 -f 4/3 to) o* 
1-098,612 cr 

1*118,034 cr 
*526,709 cr 
•352,414 o* 
*232,346 o* 
*190,636 <r 
•165,493 cr 
•051,785 cr 
1*632,993 cr 
*Jn 

0 

1-29099 <r/V« 

1-43440<r/Vn 
]-46176<r/V» 
1-47532 <r/V» 

1-47887 ir/^n 

1-48070<r/V» 

1-48641 <r/^» 
1-632,993 aly/n 
1-098,612+4/3* 
1-48641 c r/Jn 


* For the quartiles to - q = J (8n - 1), to - q' = J (n - 8) respectively. 

Hence by the first formula on p. 205, 

=2-8026,8609 log 14 8+log, (l -1).- log, (l -?) + A - H+etc. 

■ l-WOU-i+l-i+M,. 

4 

= 1 *098,612 + jjj + higher inverse powers of n . 





224 


Rank- Variates and Rank-Intervals 


It will be clear that these results are indicative of slightly less accuracy in the 
method of determining the mean from the interquartile distance, than in deter¬ 
mining it from the median. But remembering what much better results are 
obtained for determining the median from the mean of the quartile values in the 
case of a normal parent population, it is worth while considering what results we 
obtain from the “ centre ” of the interquartile distance. 

We first note that by (xxxiv), given any two ranks q and q\ the (standard 
deviation) 2 of their interval i is given by 





* 

= a\ - ct*,, by (vii w ‘> . 


But if 

c, = \ (*« + «V), C, = £ (x„ + !:„■), 

^ (&’. SCq 'I'q 'l'q ) , 


then 

°*ci = |(^+ °\ + 2 <t„<v w ) 



-l(*V + 3o\), 


or 

«r Cj = $ */o 2 ^ + 3a 2 , . 



Hence if <r 2 (I < <r Ci will be < a it and since c { is always >i if x q be there 

is some chance of better results from this method of finding the mean. 

Let us see if the mean* of the parent population as found from the centre of 
the quartiles is more accurate than the mean as found by one of the previous 
methods. 


Size of Sample n 

Interquartile Centre 

Standard Deviation 

o f 

Standard Error of Mean 
ae found from c { 

3 

11 

23 

51 

75 

99 

999 

Asymptotic 

00 

crx 1-083,3333 
crx -910,9488 
<r X *872,7486 
<rx -853,2568 
crx *848,0664 
crx -845,3873 
crx *837,8223 

<r (>836,9882 + - ) 
it X -836,9882 

cr X *650,8541 
crx *316,1731 
cr x -213,6105 
crx -141,5987 
crx -116,3588 
crx -101,0914 
a X 031,6750 

1 A , 25 \ 
<rX v /7iV + 48nj 

0 

1 -04060 o-IJn 
l-15114<r,Vn 

1-17381 a/y/n 

1-18513 <r/Jn 

1-18823 irlJn 

1 • 18981 irjlln 

1-19427 o-/v/re 

1-19476 tl- ^ ] a/Jn 

l-19476<ryn' 


It will be seen that the value of the mean as found from the centre of the 
interquartile distance is considerably better than when the mean is found from the 
median (p. 222), and therefore still better than when it is found from the inter¬ 
quartile distance itself and not from its centre (p. 223). It is not feasible with our 

# The mean of the Exponential Curve=<r=ite standard deviation. Hence to find the mean of the 
parent population is also to find its s.d. 





Karl Pearson 


225 


special definitions of the quartiles to give results for samples between 3 and 11, 
but if we take a sample of 5 and put q = 2, = 4, we find for the standard error of 

the mean as deduced from the centre of the distance between the second and 
fourth ranks 

tfmean 858 1*13282 <r/*/n, 

a result which fits well into the above series. It is clear, however, that the 4 °/ G 
increased inaccuracy of using ranking to find the mean rapidly mounts to 13 °/ 0 
and more, as our samples are increased in size. 


(7) Further Expansions for the Moments of Rank-Variates. 

Sufficient illustration has been provided of the relative ease with which problems 
in the ranking of sampled individuals from an exponential curve can be obtained 
from integer-argum^j.t values of the hyperbeta functions, or of the hypergamma 
functions as provided by our Tables 1 and II. But these functions have considerable 
interest from the standpoint of the theory of numbers. It is familiar knowledge 
that the sums of the inverse powers of the natural numbers can be obtained by the 
Euler-Maclaurin series, and the first four values are given on our p. 205. But as the 
moments of ranks in samples from the exponential curve can be expressed in terms 
of these sums, and these moments can also be obtained in series, convergent or 
finite, of other types, we can reach identities which may be of some interest to the 
pure mathematician. 

In the first place, since a x = 1 — we have 
xl<r = - loge (1 - a*) 

= a x -f ^ a* -f- - a/ + ... + - a x s + .. . 

Ji o S 


Again, 


where 

and 

while 


5 8 (b 8 a/), where b 8 =- .(xl). 

#-1 s 


(x/af = a>,* + a* + a* +1 a* + ^ a* 6 

,7 . 363 . 761 , , 7,129 

+10 a» + 560 + 1260 a* + 12 6 oo 1 


10 7>381 „n 


+ 13,860 a * 


83,711 86,021 , 3J _ 114,5993 


166,320 




180,180 


2,522,520 


= S c s+i a x ’+\ 
.»==0 


C *+2 — 

C «+2 “ 


i( 




2 1 3 


2 

*"+ 

e +1 _____ 

s + 2 C8+1 + (s+l)(s + 2) 


Ci = 0 . 


.(xli) 


.(xlii), 


These provide easy means of computing the successive terms for checking them. 
It is clear that neither series converges very rapidly, and we have to employ many 
terms in using them. 

Biometrika xxrv 15 





226 


Rank - Variates and Rank-Intervals 


The convergence of the terms will be increased when we introduce x and «* 
into the B-function expressions for n x t and fi % (x, q) ■» n a\ + „®* a . We have in fact 


f " i,*' SI X ) l 


- s b 1+ 8)1 -- 

■ ®1 («-i)K« + »)i 

g 1 9<9 + 1) , 1 9(9 + l)(g + 2) 

n + 1 2 (n + l)(w + 2) 3 (n + 1)(» + 2 )(n + 3) 


1 

+ 7 


g(g + l)(g + 2)(g + 8) 

4 (» + l)(» + 2)(« + 3)(n + 4) 
which series must therefore by (vi bi8 ) 

.I + -I- + ...+ -J_ . 

n n — l n~q +1 


-fete. 


.(xliii), 


Again, 

{«**< 


i 1 fl n \ 

■ * (1 - x (? _ T) , j. _ 2) , 


_ « „ (g + l +s) ! w! 

',=« m ( 9 - 1 )! (» + s + 2 )! 

.. 9(9±1) 9(9 + l )(9 + 2) 9(9 + 1 )(9 + 2 )(9 + 3 ) 

■^(M + l)(« + 2 )" t ' 3 ( M + i)(» + 2 )(n + 3)‘ t '° 4 (»+l)( w + 2 )( n + 3)(n + 4 ) 




•fete. 


.(xliv) 


and by (vi bia ) and (vii biB ) 




1 


(n-if 


+ ...+ 


(n-q + lf 


+ (- + -f-r + ... + — ( xlv )» 

\n n— 1 w-g + 1 / 


and doubtless similar relations could be found for the higher moments. One further 
result only may be cited : 

<r® ~n + l (, ‘ +la ' +l) + 2 (n + 1 )(n + 2 ) ( ” + ^ 9 ' +2) 

1 -rf wtVlwt?^ + etc.(xlvi) 


Thus we have 

q 


3 (yi f 1 ) (ti f 2) (n f 3) 
= by (xv) if q' be > q. 


wf 


4_1_ + 1 + J_ + +_L \ 

l\nfl ti n — 1^ ft —j'f 1/ 

1 9(9 + 1) / 1 1 1 1 \ 

T 2 (n f 1) (n f 2) \n f 2 ^ n f 1 ^ ft n — 5' f 1/ 


I 1 9(9 + l)(9 + 2 ) / 1 

3(n + l)(n + 2)(n + 3)U + 


•11 

3 + n + 2 + w + l + ,,, + » 




+ etc. 


(n-l/ T («- 2 )P 


+ ...+ 


(n -9 + l)* 


+ S + ^ + n^ + -” + 2^^)S + ^ + ^ + *” + iTT7+l)- 






Karl Pearson 


227 


Hitherto the series we have been dealing with are not finite. We will now 
express the infinite series in (xliii) and (xliv) by finite series. 

Let the series in (xliii) be represented by 

W1 + M1+ ... 4- 4-..., 

where 


u t 


l g(g + l)(? + 2)..•(<? + *-! ) 

t (u 4-1) (u 4- 2)... (n + 1) 
ir (g+pr( n+i) i £(nj hi) r(g+or(n-g+i> 

t r(j) r(n-hi4-l) t T(q) T(n — q 4* 1) * r(n + t+l) 

" r WTf TT -TT ); B <» -K « - 1 + 1 ) 

-1 

Hm “ s ~ rwr0pr > /.' 

- r ( gT^7+r) I!<"'"8(1 -»>•->*■ 


Take 
and when 


(1 — z) wm e~ u , dz = e~^ du , 
z = 0, u — 0, z = l, a =oo. 


Thus 5 = ^ ^ ,: [ u (1 - {n ^ X) 

r(gf)r(n-2 + i)j 0 v 


dw 


T(n + 1) 


r(g)r(w-g+i) 

Write u' = (n — q 4- s +1) u, find wo have 
n ! ^ _(- 1 )• 


Jo ,=o s!(g-fi-l)! 


fl- 


i [ u'e-^'du', 
Jo 


(n-q)l 8 = o «!(?-«-1)1 (n-j + s + l)F 
where the integral is simply unity. Hence 

1 , (g — l)(g-2) 

i * at 


s®« = »! r i _ g -i 

<r (n-g)! (g-1)! [(w-g' + lj® 1! 


(n-g + 2)* 
(g -l)(g-2)(g-3) 1 


3! 

1 i 1 i... i i 
n n — 1 « — 2 »— g + 1 


(n — g + 4)* 


2! (ra-g+3)* 

+ •••+(-«- Si] 


■ 1+. 


.(xlvii). 


Thus the series for n x q has been replaced by a finite series of q terms, which by 
aid of a table of the binomial coefficients and one of squares admits of easy com¬ 
putation. In the case of n — 14 and q = 10, we find by the computing of ten terms 

14^10 -1*168,228,993 x<r*. 

* This agrees to the whole nine deoimals with the value to be found from our Table of S (1 /») by 
simply subtracting the value for n=4 from that for n=14. 




228 


Rank-Variates and Rank-Intervals 


Accuracy to the fifth decimal place is only obtained by 70 to 80 terms of the series 
(xliv). In precisely similar manner we can obtain the fcth moment coefficient of 
the variate of the gth rank. In this case the integral is 


f v! k e~^‘du' — T (k +1), 
Jo 


and the inverse powers are the (k + l)th, i.o. 
fi k ' nlTjk + l) I'_1 q- 


<r k (n — q)\(q — 1)! |_(n~ <Z + 1)* +1 1! (w— g+2)** 1 


(g —l)(g —2) 


(? — l)(g~ 2)(g — 8) 


T 2! {n-q+Sy* 1 3! (n-g + 4)** 

. 

In particular 

/_ * , »! 2! a 2 f 1 q-1 1 

/*2 («-g)!(g_l)l [(*» — ?+]?” If (n-q + 2)* 

(g— l)(g - 2) 1 (g — l)(g — 2) (g - 3) 1 

* " 2! (n-q + S) a 3! ~( n _g + 4) ] '' t '‘” 

+ (-l)«- 1 --l . 


.(xlviii). 


.(xlix). 


Now by (iii), reinstating the value of the standard deviation, we have 

fik (-1 fn\ d* n/ 

o* (q-1 )l(n- q)! dn k B ” 3 +1 

Thus it follows that 

d k n („ i n/_ \ _ __ 9 ’ll_A 


dnk B(q, n q-h 1)«=( l)*r(&+l)( ( ~_^ + 1)t+1 |j ( n _ q + 2 y+i 

(q - l)(g- 2) 1 (g-l)(g-2)(g -3) ] 

^ 2! (n-q+ 3/+ 1 3! (» _ g + 4)** 1 " 

+ .(I)- 

I have not come across this series of the fcth differential of the B-function before, 
but it may be known. 

Since the differential coefficients of the B-function can all be expressed in 
terms of the hyperbeta functions, or in terms of inverse-power series, we have 
another long series of relations connecting the latter with a finite series of inverse 
powers of one higher order. Thus for example: 

II 1 ^i i i \2 

w 4 + (»“-lf + ‘‘ , + (n-j + l) ?4, \n + w-l + + f 1/ 


11 1 


_ »18! /_1_Li 1 1 . (g-l)(g-2) _1_ 

(n -q)\(q- 1)1 \(n - g +1)* 1! (n-q + 2) 8+ 2! (n-q- 


(g ~ 1) (g ~ 2) (g - 3 ) 1 

3! (n-q + 4)» 


+ ...+(-1)^1) 







Karl Pearson 


229 


It is difficult to state whether these results have mathematical interest or novelty; 
they can be supplemented by introducing the differences of the powers of zero! 

I have not succeeded in putting formula (xlvi) into a finite form. 

Before developing the solution in hyperbeta functions I had applied (xliii), 
(xliv) and (xlvi) to a number of numerical cases with a view to obtaining a general 
appreciation of the nature of rank-intervals in the case of samples from an 
exponential curve. One such illustration is, perhaps, worth preserving, partly 
because we can compare results obtained in it with samples of the same size from 
other curves, and partly because it may have some relation to matters of some 
interest in sport. 

(8) Galton*s Prifv Patio for the Exponential Curve . 

The distribution of runs in cricket certainly does not follow an exponential 
curve, but for cricket teams of about the same grade it is not widely different. 
The scores approximate to a /-curve with the maximum frequency at low scores 
and a long tail of the centuries and above. 

Let the mean (or standard deviation) of the parent population be as previously 


< 7 , then 

ii&’ii s irx 8*0198,7734, 
u&i = <r x *0909,0909, 

and 

u ^2 =o*x *1909,0909, 
11^10 — a x 2*0198,7734. 

Hence 

n#n 11^10 = o’ x 1*0, 
XIPC 2 — n#i = ff x 0*1. 


Thus, if the average cricket score were 20, the average interval between the two 
lowest scores would be two runs, but between the two highest scores 20 runs. 
This is an illustration from a skew-curve that the general principle found for 
samples from normal populations still holds true, i.e. that the difference between 
the best individuals is much greater than that between modal individuals. 

Now take u #9 = or x 1*5198,7734, 

and u^io - = <r x 0*5. 

Galton considered * that the ratio, i?, of the value of the first to the second 
prize (assuming there was no third) should be as the excess of the merit of the 
first about the third to the excess of the merit of the second above the third, i.e. 

u^n —u^9 1*5 q 

11#10 ““ "*0 

Galton, basing his conclusions partly on experiment, partly on the normal curve, 
concluded that 75 °/ 0 and 25 °/ 0 of the prize money should be given to the first 
and second competitors. There is something, however, unique in the case of the 


Biometrika , Vol. i. pp. 885 et seq . 



230 


Rank-Variates and Rank-Intervals 


exponential curve: Gallons Ratio remains consta/nt whatever be the size of the sample*, 
whereas in the case of the normal curve, we only approximate to the ratio 76 °/ c to 
25 °/ 0 as we indefinitely increase the size of the sample. 

Proceeding a stage further we have 

+ u#i 2 *“ *0165,2893 x<r®, 

{u^ 8 } “li^s 2 + ® *0547,1074 x cr a , 

{n#io 2 } *“ li^io 2 + n^'io 2 5=5 4*0799,0447 x a *, 

{u^u 8 } - u*n 2 + n^n 8 * 10*6776,9142 x a 8 , 
n a y =*0909,0909 x <r, u <r 2 * *1351,4608 x <r, 
no-io * *7470,1552 x <r, u <t u - 1*2482,1163 x <r. 

{n%\ x 11 ^ 2 } = *0256,1983 x or 2 , 

{u #1 x u^u } 3=5 *2827,9877 x a*. 

These latter require an inordinate number of terms of (xlvi) to compute them to 
eight figure accuracy. 

r = { 11^1 x 11 ^ 2 }~ n#i x iv r * 


whence 


Further, 


' n x i, n x * “ 


U<Ti X nCTa 

*0256,1983 - *0909,0909 x *1909,0909 


*0909,0909 x *1351,4608 
- *672,672, 

agreeing practically with the value 

*0909,0909 


r M*i.u*» - Wn°-* 1351,4608 

found by the recent shorter method of (xvi bi *). 


*672,673 


Similarly 


^11*1,11*11 


(ll^l * ll^ll} “ 11^1 x 


na\ x uoii 

*2827,9877 - *0909,0909 x 3*0198,7724 
0909,0909x 1*2482,1163 


= *072,8315, 


while n<ri/u<ru “ # 07 2,8316, 

only differing in the 7th decimal place. 


We see that the score of the 1st rank is highly correlated with the score of the 
2nd rank, but its correlation with the score of the 11th rank is relatively small. 
If we may suppose cricket scores to be roughly given by an exponential curve, we 
might interpret this by saying that the best man getting a high score has not 
much influence on the scores of the tail of the eleven, but that the tail of the 
eleven will on the average score relatively well or relatively badly as a whole, their 


* It should be noted that whatever be the size of the sample the interrank intervals and their 
standard errors remain constant for the same interval reckoned from the nth rank. Increasing the size 
of the sample only pushes the series of intervals further towards the tail of the exponential, introducing 
additional intervals behind these. 



Karl Pearson 


281 


score correlations being high. The actual explanation of the phenomena lies in 
the fact that if the best get a high score this has to be carried down through the 
middle members of the team, before space is gained for the expansion of the scores 
of the tail, whereas if the last but one or last but two of the tail scores well there 
is a wider space for the remainder, which on the average will have a higher score 
also. Of course the correlation of the second man's score with the first man's 
(i.e. the variate of the 10th with that of the 11th rank) is high also; it equals 
nevJuPn ■■ *598,469. In the same way u<r f = *555,0065, and the correlation of the 
scores of the first and third man (i.e. of 11th and 9th rank variates) falls to *444,641, 
and so on down the team till it reaches *072,832, for the scores of the first and 
last man. 


(9) Further ProbUrn on Ranks and Rank-Intervals. 

(i) To find the Curve of Frequency for the jth Rank-Variate. 
The probability of the gih rank occurring in the element da x is 


(n)! 


: o*® 1 dog (1 — u x ) n 


(9 — 1)! (« —ff)! 

Hence the probability of the variate of the qth rank lying between x and x + dx 
(q-l)\(n-q)\<r 

Or, the frequency curve for the gth Rank-Variate is 
_ N (»)! 
o- (q — !)! (» — ?)*! 

If q = 1, the frequency curve is 




Nn -~ 

y=— e > 

an exponential curve of standard deviation ^ 

If 9 = ft, the frequency curve is 

<T 

If w be even, and q » \n 4-1, the frequency curve is 

This is the distribution of the “ median.” 

(ii) To find the (q, q') Rank-Interval. 

Here the probability of the jth rank occurring in da x , and the qth rank in 
(«)! 


da,/ is 


( 9 - 1 ) 1 ( 9 '-?-!)!(«-?')! 


a x i- l du x (a x < — a*)®' - ® -1 (1 - a*-)" - ®' da*-. 


See p. 212 above. 



Rank- Variates and Rank-Intervals 


Or, the probability of the 5 th and ?'th variates occurring in das and das' 
respectively is 

. _ Ml _, 1 _ . - 1 ; *f - 1 _ - 1 ^ - J <« - ?'+1) d®' 

(g -l)!(q'-g-l)!(n- q yy ' ; V V ' * * 

Write <r' = 22^' + #, and integrate a? between the limits 0 and oo. Then the 
frequency curve of ranges, R^j, in N samples of size n is 

N(nV 1 , -^(n-g' + l) 

_ _____ n~c * W-7-lg <r v * 

(< 7 - 1 )! (</'-</- 1 )! (w - 7 ')! 


x I e «■ 


--(»-? + !) 


( 1—5 <T )®** 1 do :. 


Take z-e * and 


r e ” * ( " ? +1( (1 - e ” ^)«- 1 (fa = a f 1 *—« (1 - ( 

J 0 Jo 

n / v (?i ~ </)! (</ ~ 1 )! 

= B (»-? + !, ^f---- 


Thus the frequency curve becomes 

N(n-q)l „ - H y'(n-q'+l) 

If?-1.9' = n, we have the frequency curve for the entire range 22 : 


y = M ( 1 . e -Me^. 

If g' = ^ 4 * 1, we have the distribution of range of the qth interval, namely 

Njn-q) - J -ks±l(«- 3 ) 

<T * 

which is an exponential curve of standard deviation with mean 22 ^ v+ i = — 
also. 

The relation of our interrank distance curve to the rank-variate curve is seen 
at once if we put in the former q = 0 , when it becomes 

_ sm _a_r %V--r"" 

which is identical with the curve for the g'th rank-variate, if we take Ro t q>^x^ y 
i.e. take the zero-rank to be the start of the parent population. 

To indicate the relation to the hyperbeta functions, we have for the 5 th moment 
coefficient about zero, writing u for R q%<] > for brevity, 

_*(—»>'_ f.,„ 




/•CD U< , , 

7 T-. t M*(l -e~e~* l q 
' )uo 




Karl Pearson 


233 


U 

<r 


u = - a log z y du — — ~ dz, 
~~ z 


Put 

and we find 

which connects the sth moment of R q< $ with the sth hyperbeta function. 


, v <r*(— I)'N(n — q)\ d 1 „ , , , , 


The object of the previous sections has been to demonstrate that the problem 
of ranks and rank-intervals can be fully solved for the exponential curve. The 
special interest of the exponential curve lies in the fact that it describes the 
distribution of the intervals between random occurrences in space or time** 
Accordingly if we arrange n such intervals in order of size, we can discuss many 
properties of randonj occurrences by the formulae of this article, such as the mean 
and variance of these intervals, and of the same quantities for the difference of 
intervals. It can hardly be doubted that such knowledge may be of value in 
determining the degree with which a given distribution of intervals in medical or 
social phenomena approximate to a random series. Thus a study of samples from 
an exponential series has a wider interest than may at first sight be attributed 
to it. 


(10) Ranks and Rank-Intervals for the Q3\ , ft) Biquadratic Curves. 

The Rectangle and the Exponential Curve are only two special cases of the 
curves which occur along the Biquadratic f 

ft(8ft - 9ft-12)(ft + 3) a - (10ft- 12ft - 18) 2 (4ft - 3ft) - 0 ...(lii) 
in the ft, ft plane. 

All these curves can have their x easily expressed in terms of their a Xy and so 
the properties of their ranked samples can be obtained. 

In my notation the following curves occur: 

Type VIII. 2/ = yo ( 1 + *) ..(liii)- 


This Type is found from the Rectangular Point along the upper branch of the 
biquadratic loop. 

The total range is from x = 0, where the ordinate is finite, to a?* — a, when we 
have an infinite (asymptotic) ordinate. 

Here m - 2 (9 + 6ft - 5ft)/(6 + 3ft - 2ft) .(liv), 

and is always less than unity. 

We may write our curve by change of origin in the form 


y 



(lv), 


* See Whitworth, Choice and Chance , 6th Ed. p. 200. 
f Phil. Tram. Vol. 216^, pp* 486 et »eq. 






234 


Ranh- Variates and Rank-Intervals 


with range from x' = 0 to a. If a be the standard deviation, 

a - <r (2 - m) .( lvi )> 

and the mean x is given by 

W — a (1 - m)/(2 - m) * a V(1 - m) (3 - m) .(Ivii). 


We have at once 


: «(«* ) 


l_ 

1 - rn 


or 


:-<* 




1_ 

1 - m 


.(lviii). 


Since w is < 1, 


1 — m 

reached, and this has been fully discussed*. 


is a positive power >1. At w*0, the Rectangle is 


TypelXy. + .(!«)• 

Passing from the Rectangle Point m is positive and less than unity until we 
reach the Line Point L, at which m is unity. Beyond this we have m greater than 
unity (Type IX t ) until m equals infinity, and we have the Exponential Curve 
already dealt with. 

We can throw these curves together as 



i) 

§ 

©7K 

.(lx). 

with a range from 

Here 

zero ordinate at x' = 0 to a finite ordinate at x' = a. 

x~a(m -f 1 )/(m + 2) =* <r V(w+ 1) (rn + 3) . 

—(lxi), 


o = < r(m + 2) v / wt + 1 . 

—(lxii). 

and 

X ,1 

„/ „ /„ \in -f 1 _ /_i /Wl "f 3 ^ m-t-1 

x =a (a X ') + . 

..(lxiii). 

Clearly the power of a* is a positive proper fraction. 


Finally 

2(5&-6ft-9) 

3&-2/9, + 6 . 

...(lxiv). 


Type XI. This is given by 




.(lxv). 


Passing the Exponential Point, the curve takes the above form, where the 
range is from a ** a, a constant, to oo , and 


m 


2 &- 3&-6 

and can take all positive values from oo to 5. 


2 (5fr -6ft -9) 

~ 2£o-3&-6 . ( lxvi) ’ 


Biometrika , Vol. xxm. pp. 890—897. 














Karl Pearson 


285 


Here «=* a(m — l)/(w — 2) «= <r V(m — l)(m — 3) ^ 

, 9 s /th-S .< lxvii >’ 

o-<r(m- 2 ) v /—^ J 

and a x » 1 — .(lxviii). 

Hence * = a .——j- .(lxix) 

(1 - a„)’» r i 

= <r (m — 2) -—X .(!«)• 

(1 

f, * 1 

Since m is positive and greater than unity, ——~ is a positive proper fraction *. 

We will now consider these curves in order in relation to ranking. 

Type VIII . Let n w q be the mean of the 9 th rank-variate in samples of size n , 
and n <r q its standard deviation, where we may drop the prefix n if there be no 
danger of confusion. 

(i) To find n x q : 

r(7l + l) f 1 q— 1h — L. 

= a f(j)T(n-q + l)J 0 a * 1 ' M(1 

r(w + 1 ) -q ( , 1 \ 

! a ftofM+Ij B l 9 + r^m ,ra " ?+1 j 

r(»+D r ( ?+ f^m) r( *-« +1 > 

a r(«/>r(w-f/+i) r (» + j“+ 1 ) 

r(n+i) r (? + r^) ,, 

“ / 1 \ r(a )—.( ,XX1 >- 

This can be found from any table of T-functions. 

M* ( n > 9) + ft = i^ < Jr(n‘-q + l ) /» “* ?_1 (1 ~ X ? da * 

l 2 

r(n + i) r l ? + rrw 

/ o \ p /«\ .(lxxn). 

r (” +1+ r^) (J> 

# For a fuller discussion of these curves the reader is referred to the memoir already oited. It must 
be remembered throughout that fa is given in terms of fa by the equation to the biquadratic. 










236 


Rank-Variates and Rank-Intervals 


Similarly fx, ( n, q) «= a* 


T(w + 1 ) 


'(*+r 


rU+i + 


1 — m) 


.(lxxiii), 


from which, by reduction to the mean, all the higher moments and /3*b can be 
found. 

Interesting special cases can be considered. For example the mean of the 

variate of the first rank, i.e. q « 1, is 

__ a 

{} + n( 1 — m)) (} (w —1)(1 — m)) (* + (w — 2)(] - m)) *”(*"*" 1 —m) 
and the mean of the variate of the last rank 


n(l — m) 

The relation of these to the general value 


is clear. 


( ] + n (1 - m)) l 1 + (« -1) (1 - m)) ( X + (« - 2) (1 - m>) • • • ( X + ? (1 - m )) 

.(lxxiv) 


We have difference relations of the form 


n®q -1 = ■ 




n * 9 ( 1 + (o - 1) (1 - m)) nXq ~ l ' 


" 7 V ' {q-l)0.-m)r ri 

The latter leads us to the mean interval between adjacent ranks, g —1th 
and gth: 

_ 1 

»*» - »«V-1 - «**-* 


(q-l)(l-m) 

Again 

n®q —1 nfiq —2 


i 1 + n(l-J( 1 + (n-l)(l-m))"- ( 1 + (q- 1)(1 - m>) 


' (?“ 2)(1 — m) /. 1_\/. 1 \ A , 1 V 

V n(T-m)A («-l)(l-m)/ V (g-2)(l-mV 

Or -? a; v~ na! fa L = 9~ 2 /j . 1 \ = ?“ 2 , _1_ 

»*«-i —»%-* 1 \ (5- 2)(1 — m)/ g — 1 " r (g i —1)(1 — m) 


.(lxxv). 






Karl Pearson 


237 


It is clear from this result that the ratio of interrank distances is independent * 
of the size of the sample, but their actual values and their position do depend on 
the size of the sample. Further we have 

r&g - nfiq—l _ 2 g - 8 1 

r$q -1 n^q-2 2 ^ (} — 1) (1 — 7Tl) 

Putting in succession q = n and q «* 8 , we have 


and 


n'^n n^n~2 ^ 2 ?l — 3 ^ _ 1 _ -as 2 4* ^ ^ 

»5f n _i — n^n-a n —1 (/i—1) (1 — m) + n-ll-m 

n^8 “ n^2 _ 1 1 _ 1 2 — m 

%»^i~2*2(1 —1»)~21—m* 

n x 3 — n ^x _ 1 2 — yn j __ 1 4 — 3 m 

/ 2 -n ^"2 1 - w T "2 1 -m ’ 


...(lxxvi), 


and thus 


n flfr-nffi _ 4-3ru _ 2 _ m 

n #8 - n%2 2 — Wl 2 - W 


(lxxvii). 


Comparing (lxxvi) with (lxxvii) we see that Galtons Ratio is a little greater 
than 2 at the stump (x**a) and somewhat less than 2 at the asymptotic end 
(#« 0 ). It is difficult to say which must be looked on as the prize, which the 
penalty end of the rank series 1 We see that when wi = 0 we get the rectangle, 
and both ends of the series give an identical result, 2 , for Galton’s Ratio. 

To complete the requisite formulae we may determine the value about the 
origin of the curve of a product moment coefficient of any order. Writing x for 
r (n + l)/[r (q) r {q' - ?) T (n - 9 ' + 1 )], we have 

P'.t = UV x nVl = « ,+t X f (texaj- 1 f da m ' (a* - a,)*' -4-1 (1 - a*-)"" 4 ' «/ 1 "‘ 

JO J * x 

ri 1 * 

“ a* +< X I ^ a * a * i-» u (gf' - q -1).(lxxviii), 

J 0 

n J_ 

where u (q* - q -1)« I (a^ - ’-*- 1 (1 - a K ‘) n ^a^- n 

J+x 

Now 

~- q 1) " = -(?'-?-!) £ <*««' («»' - a *) 4 '- 4 - 8 (1 - • 


Integrate (lxxviii) by parts and we have 


p'.t = a* t X 


i-r 


? + 


— [ 

Jo 


«+: 


i-m u (q f — <7 - 2 ). 


1 —m 


If we repeat this integration by parts q' — q — 2 times we find the factor — a* 
will disappear, the part outside the integration always vanishing. Thus 





238 


Ranh- Variates and RanJc-Intervals 


p'.t = a Ht X 


i - 2-1 tf-g-2 


0 + __L_ 0+ l+ * o' _2 H_-_ 

* 1 — vi* 1 — m * 1 — m 

x f 1 f (1 - a.-)*-*' da,-. 

Jo j *x 

Repeating the integration by parts once more we have 

-■-* \ r (» + rbO 

r ( ?+ r=W 

The integral is now the B-function B , n — q' + 1^. 

Replacing the B-function by F-functions and inserting the value of x> we have 


(«V X „V! = a' 4 '* • 


T(w+ 1) 


+i)_ r ( ?+ r^) r ( g ' + ij^) 

+£■£) r<?) r (<' + nb») 


" “ r (—r- + ‘) PW 

.(lxxix). 

If we put q'^q, and £ = 0, we obtain the value of {„#/( = /V(n, g) agreeing with 
the result in (lxxiii). Expanding the r«functions and rearranging, we have 

P 9t (w» ?> ? ) * {w^a* X 


(* + n (1 - »)) ( J + (» -1) (1 - w)) ’' • l 1 + ?' (1 - m)) 

_1_ 

( ! + (sf-l) (!-»)) i 1 + (q' — 2) (1 — w)) -( 1 + j(l-»j) 


... (lxxx) 


= p'*+t (n, q') X IX,' (q'-l, q)/a‘ .(lxxxi). 

As a particular case, if 8 = t «1, 

{„«, x „.%) - x IX 2 (n, q')/a 

a 

(* + n(l - to)) (* + (n-l)(l -m)) "* (* + q' wt)) 

_ a _ 

l 1 + (7-1)(I-«T)) { l + 7-2)(l-m)) (* + 5(T-^m)) 

* (1 -»»)) x (to) .^.(Ixxxii), 

where u ^ v (w) « the mean of the t/th rank in a sample of size u t from a curve with 
power w. 






Earl Pearson 


280 


It is possible to write down at once the frequency curve for th‘e variate of the 
9 th rank x t . The chance of obtaining a* 

_ JOL 


1—m 

but «,* (-J , therefore 
Chance of obtaining x q 




1 — m 


(«)! 






a (q-l)\(n-q)l\aj 
Or, the frequency distribution for N samples of n of the variate of the qth rank is 


N(l-m) (n) I ^ 

* , (g-1)! (»-?)! W \ ~W / 

f 4 

This is a “ transformed ” Type I equation. If we multiply by at, and transform by 
writing z = (xja) 1 " 1 *, we obtain on integration the value of /a/ given in Formula 
(lxxiii). Putting m = 0 we obtain the curve for sampling from a Rectangular 
Population, as given (with the mode as origin) in the first section of this paper*. 


By changing 1 — m to 1 + m, we obtain the corresponding frequency curve for 
Type X, and by writing — (m — 1) for 1 — m (m now >1), we obtain that for 
Type XI. 

A short proof of the curve of distribution of the rank-variate may be obtained 
as follows: 


The curve of frequency for a rectangle is 

N 


y = — 
J a 


and the element of frequency is Ndx/b. 

Tranform this curve, the origin being at one terminal, by taking 

HT’ - t-GP*-)*’- 

The transformed curve is now 

N(l-m) f a\ m 

. W > 

the form of our parent population. Since x' is fixed by x, the rank of x' will be the 
same as that of x, but the frequency between x and x+Sx of the gth rank isf 

or transforming to x' } 

which gives us the same curve of frequency as before. 


* See Biometrika, Vol. xxm. p. 891, Formula (xxii). 

t By an oversight in Biometrika, Vol. xxm. p. 891, Formula (xxxii), (n-1)! is printed in the 
numerator for n! 



240 


Rank- Variates and Rank-Intervals 


Type IX. 





and 


1 

x = a (a, c ) w+1 . 


Here we have only to replace j by ~ in (lxxi), (lxxii) and (lxxix)—with 
s = t = 1—to obtain our results. We have 


x a = a 


r(n + 1) 


n ,v q 


n <r* + n %* - a 8 


_ ' ( g + wTi) 

r (* +1+ « 1 +i) r(9) 

r(M+l) r ( 9 + m+ l) 


.(lxxxiii), 


r (q) 


\ n X Q x i ** a 


r (” +1+ ,»TT) r ("' + ,r-fi) r<?) 


.(lxxxiv), 


r ('/?/ - 4 - 1 ) 


(lxxxv). 


These may be replaced by 


(* + n (m -f 1)) 0 + (n- 1) (m + 1)) *" (* * + 1)) 

.(lxxxvi), 

/ 2 , a* 

M2 - + n^a - 7 2 W 2 \ 7 2 \ 

\ + w(m + 1)/ V + (?i- l)(m+ i)J "" \ + q(m + l)/ 

.(lxxxvii), 

{»*« x »*v} = «-i»« X (»<v* + „%*).(lxxxviii) 

a 

i 1 + (g'-l)(m + l)) (* + (?'—2)(m+Tj) -( 1 + q(m + l)) 

x a 

l 1 + n(wt + l))( 1 + (w-i)~(m + l))"' (* + j'(m+ 1)) 

.(lxxxix). 


It will be noted that if „x u (m) denote the jth rank in a sample of n from a 
curve with power m, then 

n^ 2 (»») + n x* (m) = ax„x v (i(m-l )), 

al »d { n x q ( m ) x „xj (m)} = ^_i* a (m) x n x q (i (m -1)).(xc). 










Karl Pearson 


241 


Precisely as in fche case of Type VIII we have 

, T ... .] r(»+p r (^~-‘-l) r ( ? ' + 5Tl) 


( X + n(m+ I)) (* + (»- l)(m+l)) "* (* + g>i + l)) 

l 1 + (g'-l)(»»+T ))( 1 + (g'-2)(m + l))*” (* + q(m+l)) 

/ , . (xci > 

- ^ 0 , ?') x fi! (f/ - 1 , q)/a* .(xcn), 

and as particular cases we can deduce the three results already given by putting 
q and (i) s * 1 , t *= 0 , (ii) $ = 2 , *= 0 , and (iii) 8 = t « 1 . 

All the results for Type VIII hold for Type IX, if we replace the 1 — m in the 
former case by w-f 1 in the latter. We must remember that m may now range 
from zero to infinity. 

Type XI. Here the relationship between x and a x changes its form and our 
fundamental equations also: 

^ ^_ r (ll -f 1) f fl-i/i v n, i 


Tr,TT^TT, J.’ *■ 

_ r(n+ 1)_ r W r ("-g + 1 ~^~|) 

r©r(»-,+i) r / 1 ■\ 

V m- 1/ 

r<»+i) r (»-!+ 1 -,-jrri) 


r, - ,+1) «■(•+>-;£i) 


.(xciii) 


i} n(ra—1))(* (n~l)(m — '(n-2)(ffi-l)) , ”( 1 (n- 9 +l)(m-l)) 

.(xciv). 

Similarly 

M* (». 9) «*<? + *** 

. r (n +1) r ( n ~ g + 1 ~r^l) 


r(n-g+l) 




(* n(m— 1))(^ («— l)(m — 1)) (* (n—2)(m — 1)) (»— g+l)(m-1)) 

.(xcv). 


Biometrika xxzv 








242 


Rank- Variates and’ Rank-Intervals 


In the same way it is perfectly easy to write down the expression for all the 
higher moment coefficients about the start of the curve at x = a. For we have 

/*»' ( n > 9) * fnVl 

r(«+i) r ( n_? + 1 “^l) 

- -7)-!>-? + !) “ . 


■(' 


.(xcvi) 


r(n.+ l- 


m • 


( X v (m - 1)) (* (»i-l)(w-l)) (* (n-q+ l)(m-l)) 

.(xcvii). 


A word of warning is hero requisite, n is generally a fairly large number and 
m lies between 5 and infinity. We have, however, the factor n —j-l-l, the lowest 
value of which is unity, i.e. when q is the nth rank. Supposing m to approach the 
limit 5, m — 1 -► 4, and accordingly with s = 4 the fourth moment coefficient would 
approach infinity, and higher moments might become negative. 


As before 


a r <” + 1 > r (”-5 +1 -^l) 


* 1 r (»-? + 2 ) r (n + l--t-j) 
If we put q = 2, and q = n in succession, we have 


.(xcviii). 


n x * ~ n x 1 = 


a 

m — 1 


and 


(n - (n- 1-) 

\ m — 1 / \ to — 1 / 


n x n n x n —1 * 


a 

1 ) 1—1 


a 

in —1 


n! 


Hence 


(n -— f ) (n-1 - ... (l - 

\ in— 1 /\ — 1 / \ m — 1 / 

_ 1 _ 

(} 7?(m- l))(* (n- l)(m-lj) (* in— l) 


n x 2 ~ n x l __ I /,_1_\ /,_1_\ /. _ 1 \ 

n x n — n*^n—i w-l\ (n-2)(ra-l)/V (n-3)(m- 1)/ * \ m — l) 

.(xcix). 


Hence with a curve of this type, since m > 1 and all the factors in the numerator 
must be less than unity, we see that the interval between the two highest indi¬ 
viduals is more than (n— 1) times the interval between the two lowest individuals. 
For example, cricket scores follow a distribution not unlike that of the present 
curve, and we might accordingly anticipate that the average difference in score 
between the two best scorers in a sample of 11 would be upwards of 10 times the 
average difference in score between the two worse scorers. 







Karl Pearson 


248 


It remains to find the mean product moments of the variates of two ranks 
q and q'. Using x 118 before for 

r(n + l)/r (q) r( q ' -q)T (» -q' + 1 ), 


o'-* 


fl « 

we have p\, t - {«V x „V1 * «** X ] ( 1 ~ a *) 

where «V-b-i = [ *da K af~ x (1 - a m )~m^i (a^ — a,) 4 ' -4-1 . 

Jo 

Integrating by parts as on previous occasions, we find 

(<7'-g-l)(</-g-2).1 




{—* J i-«' + 2 -S^)-(*-» 

x [ da 4 (1 — 1 [ * da^a*- 1 (1 - . 

Jo Jo 


Integrate by parts once more and we find 


P *,t 


= a*+‘ x - 7 - 7—7 -J # da* 0 c * 4 - 1 (1 - <v)’ 


m -1 


= a*+* 


: a a+< 


r (»-5+ i -~-j) 


r»)r(i'-i)r(«-i , +i) r j l _ ftl _j__) r(„ + i-£±L) 

r («+1 > r (”-»' +1 -iil) r ("-« +I -^) 


«•(•+>-£*) 


+ t \ r (n - q’ + 1 ) 


r (“-s +1 -dn) 


(c) 


( l (n- q) (m- 1 )) (n-< 7 -l)(m-l)) ( 1 (n-q' + 1 )(w- 1 )) 


(l _ _£+l_\ A _£ ±i (i _i±i_\ 

V (n —l)(m—1)/ *” \ A (n- 3 +l)(m-l)/ 

Here m can take all values from oo down to 5 *. 


.(ci). 


( 11 ) Numerical Illustrations of the Biquadratic Curves . 

I propose to illustrate the formulae of the preceding sections on a variety of 
parent populations taken right round the biquadratic. 


See Phil . Tram. Vol. 216*, pp. 444—5. 


16—2 






244 


Rank - Variates and Rank-Intervals 


Illustration I. Type VIII. 

N /aV 8 

Curve (i):y- K t) • 

Range # * 0 to » a. 
ft* 2*75, ft *4*714,2857. 
cr as a x *2512,5945, a* <r x 3*9799,4975. 

The form of the curve is shown in the accompanying Fig. 1 . The ranks are 



reckoned from the asymptotic at the origin towards the stump. We suppose here, 
as in later curves, samples of 11 to be taken. 

Galton Ratio: at stump end = ~^ 9 = 2*4, 

11^10 - 11^9 


» 


a 


at asymptotic start 


u&’a — ua?i = 4 
11*^2 — ll^'l 3 


therefore in such a case the prizes should be as 12 to 5, and the penalties as 4 to 3. 
We use in the first place Equation (lxxiii) and find for q * 1 , 2 , 10 ,11 and s* 1: 

IT “ ) = -0002,2893,77, ^ = •0013,7362,64, 

= -4583,3333,33, and u ^‘« 6875,0000,00. 

a a 

Dropping prefixes for brevity, we have 

ft - ft * *0011,4468,87, x u - ft 0 - *2291,6666,67, 


and thus - f 10 * 200 * 20 . 

It is such ratios of the intervals between the best men and between two mediocre 
individuals which clisconcert mediocrity, and make it believe that the high rankers 
are not the extremes of their own population, but exceptionalities belonging to a 



Karl Pearson 


245 


different categoiy, which they label “genius.” It is interesting to note that the high 
order of this ratio is not confined to normal populations with centralised mediocrity 
as in the measurement of intelligence, but extends to cases like the present with 
mediocrity at one end of the scale as in the distribution of cricket scores, or of 
inoomes. 

Passing to the mean products of the second order, we must use (lxxiii) and 
(lxxix), and these give, treating a as unit, 

{«/} - •0000,0283,51, {*i ok) = -0000,0519,78, {**»} = -0000,3118,66, 

M = -2619,0476,19, [x w x u } = -3492,0634,92, {ar u *l = -5238,0952,38, 
faaki} = -0001,3082,16, \xix u ) = -0001,7442,87, 

*= 0007,8492,94, {ar a a? u ] = -0010,4657,25. 

From these we deduce, restoring a: 

a* xi = {«!*} — Xx — a 2 x -0000,0278,27, or cr Xl = a x -0016,6815,09. 

Similarly <x\ = a 4 x -0000,2929,98, or <r x , = a x -0054,1292,01, 

a 4 ^ = a 4 x -0518,3531,91, o-, M = a x -2276,7371,19, 

***„ = a 4 x -0511,5327,38, a „ a =ax -2261,7089,51. 

The absolute variability increases from the first rank to the tenth, and drops for 
the final rank. Thus at the sixth and ninth ranks we have 

o-*, =-070,1062,09 and < 7 „ = 1929,9154,70. 

I can only explain this drop at the eleventh rank as due to the abrupt termination 
of the curve. 


If we take relative variabilities as given by the coefficient of variation 

V t = 100 (Tffxf t 

we have F x * 728-646, V % - 394*061, F e - 124*818, 

V 9 = 65*500, Flo « 49*674, F u = 32*898, 
or there is a continuous decrease in relative variation. The variation in the lowest 
rank is extremely great owing to the small amount of the character the first indi¬ 
vidual possesses, while that of the highest individual is 20 to 30 times less. Of 
course all the variations are high, as we are dealing only with a very^waall Ff QTY 1 f lQ 

We will next consider the mean and variation of the first and last interrank 
intervals. 


- f#* 1 } — 2 (#a#i} 4* (# 1 *} — (3?a — xif = a 2 x *0000,2231,58. 
Thus * a x *0047,2396, 

while a* - #1 = a x *0011,4469. 

Similarly * a * x *0347,8422,60 and 0^-*^ = «x *1865,0530, 

while x%i —xjo «= a x *2291,6667. 


It is thus fairly easy to obtain the mean and standard deviation of any rank 
interval. We note that while the absolute variation of the highest interval is far 
greater than that of the lowest, the relative variation is much less. 



246 


Rank-Variates and Rank-Intervals 


We may now determine various correlations. We have, generally, 

{awl ~ K} M 

and by using the mean products we obtain for n — 13 

^-•540,810,27, -365,573,22, *675,973,14, 

and again 

r *x *io “ *068,173,14, *045,148,76, ^^--662,268,95. 

These results indicate that the scores of the higher ranks are more closely correlated 
than those of the lower, but the farther the ranks correlated are apart the lower 
their correlation, this decreasing rapidly with the distance between them. 


Further it will be found that the following relations hold: 

- 7* yt n/% —— /?-» rtf> 

X X Xt ~ 7 ZiXt'XiXi* r *l*ll 7 *1*10 7 0510 *11* 

Seeing this in the numerical results, it drew rny attention to the fact that if 
q < q‘ <q"> we must always have 

r Vj Xq" = r XqZq' r Z q 'Zq" 

in any system of ranks from any parent population. It is simply the statement that 
if you fix the variate of the g'th ranker, the partial correlation coefficient rxqx^.xq* 
must be zero; for no variation of the position of the <j"th ranker can affect the gth 
ranker, if an intervening ranker occupies a fixed position. 

We can prove this result generally for the present curve types from Formulae 
(lxxiii) and (lxxix) or their equivalent forms, which may be written 

r(n + l) T(g + 2\) _ I>+1) r(q + \) 

r(»+ l + 2 \)* r (q) ’ ^- a r(n+l+A 

< a . T ) _ a 2 r (g ±jO r (q' + 2 x) 

1 r (n + 1 + 2\) f(q) r (q' + \) ‘ 


« 2 ] = 


= /x 2 =a- 


sss n? 


r(7) 


Hence 


I "— 




i r(« + i)r(?+\) fr(q'+2*) r(« + i + 2 x)r(R+i)r( 9 '+x)\ 
r> + 1 + 2 \> f(q) V r (V+\) ' r(« + i+\)»r( 9 r ) ) 

i r^+x) r( 9 ') ( r<n+i) r( g '+2x) 

* U 1 (» + 1 + 2X) f (q') 

_ f r(»+i)r(g^f\) |^ 

i rfo+x) r( 9 ') 


o’*, 0- *,' r (q) r (q'+ x) \r (» + 1 + 2 x> 


a Xg &X 


r(g+x )_ 

r<?) r< s r +x) 


'ir(*+i+x)r( ? ') 


({Vi - i a v! 2 ) 


_ °y r ( 9 + x) r ( 9 o 

“ V T(q) T(q' + X) 
Similarly 
and therefore 


.(cii). 


' 0Cq> Xq" 


a *C r (?' + X) r (q") 


** r (2') T(q"+X)’ 


r r _^ r( g + X) T(q") 

*, v *y <V’ a r (q) ‘ r (<1 " + X) ~ W 


.(ciii); 


thus the theorem is proved generally. 





Karl Pearson 


247 


If we pat A* 0, we obtain the exponential curve, and remembering the change 
in the sense of the axis of that curve, we obtain the result on our p. 214 (xvi u *), 
where c v is now less than er^, the order of ranking being reversed. 

As corollaries we may note that for adjacent ranks 
. <r «+ 1 1 


r «,«7+i ! 

r «’ ! 


< 4 


<t„ q + A 

1 r «. «+i r «. «+* *«,«+»•• • *V-i, «' 

g g +1 g' — 1 oy 

g+Xg+A+1 g 7 —l+Xffj 

before. 

r(g)r(g' + \)<r. 


.(Civ), 


.(cv) 


For any two rank-intervals 


i + A)' 




V rV) r(g) 


) 


r(Q 


r(g" , +A)~ <r ^ , r(g ,, +A)) 


r(g") 


x .(cvi), 

which presents some simplification in computing the correlation of interrank 
intervals (q < q' < q” < </"'). 

Returning to the special example illustrated numerically on p. 245. I wrote 
down the regression lines of x xo on #u and of x x on a?u, and giving various values to 
x xl) I invariably found that the percentage increase of x xo on x xo or x x on x x was 
precisely the percentage by which I had increased x lx on Xn- In other words I was 
forced to the conclusion that the general regression line of x q on ay must be 

3? 

.(cvii). 


ay 


ay. 


Having reached this result numerically, it can easily be proved algebraically*. For 

r r(g') r(g+x) oy 

w ~r(g) rfg' + A )^ 1 
and the terms in q\ q ® 3y/ay; hence 

Xq Xq = ^ (ay ay) 


or, 


uy 

■!> 

x g 

ay. 

ay 


• *v), 


This form of relationship is free from the parameters of the parent curve, and 
suggests that when we increase the total range of the sample we shall on the 
average increase all the subranges ill the same proportion. How far this property 
extends beyond the curve types on the (fit, 13%) biquadratic I am not at present 
able to state. The difficulty arises from the need of a fixed terminal to measure x q 
from. It would be interesting, however, to investigate further the distribution of 
rank-intervals in the case of other curves. 

* I might have removed the numerical road to the theorem, bat I think it may suggest to other* 
that mere eomputing work may occasionally lead one to analytical theorems. 







248 


Rank-Variates and Rank-Intervals 


Still dealing with the same numerical example, i.e. we may 

indicate how the correlation is determined between any two rank-intervals; let, 
say, the first and last, 

= f(>2 ~ #l) (#11 - #10) 1 - (^2 - &l) ('% - gig) 


*U-*10 




X CT 




Now {(a* - «,) (ar u — a 10 )} = + fa^io} - { r i*u} - {«2*io}- 

Substituting the values from pp. 244 and 245 we find 


' -»i t *n**" a ’io' 


* — *050,268. 


The negative correlation is due I think to the fact that the total range being 
limited, an increased first interrank must denote a decreased last interrank interval. 
The two intervals, however, being far apart, there is only a very small correlation 
between their magnitudes. The correctness of this explanation will be confirmed 
as we take further curves round the biquadratic. 

We may illustrate further our theory of the ranking of the biquadratic curves 
by considering on the same curve the problem of total range distribution. The 
only expression I have been able to find for the frequency curve of range R is of 
the form* 


y 


i»)-l « / JB\* 

-»U) ,2. (ov,, ' , • 


* It is, perhapB, better retained in the definite integral form, which exhibits its relation to the known 
form in the ease of a rectangular parent population. 

The chance of as, and x n occurring with the values a% x and ax„ is 

daxi ( ax * ~ axi ) H ~ 2 d(lxn 
_ n (» -1) (1 - m) 2 /r, \ ~ m 




Put B^x n -x u eliminate x n and integrate for all possible values of x 19 i.e. from 0 to a-JR, then we 
find the chance of a range lying between R and R + dR is 


Write z 


pg n (n -1) (1 - m )* j— * (xy™ ^a; j- R y “»» _ ^*y-*y-* (x + Ry™ dxdR 

( x Y” w 

x + RJ * W ° °^ a ' n a ^ er reductions 

p_ n (n - 1)(1 - ro) /-(l-f) 1 (l- Z )n-2dz 

a \o) Jo JL 

(1 ~ 2 1 “ m ) n ( l ~ m ) 


Write 


ite 2 = ^1 - ^ 

y_ Nn(n- 1) (1-m) 1 ^ gy 


J u y then the frequency ourve for the distribution of N ranges is 

r mlj nr 


This is expansible in a series like (cviii) above. 
Put m =0 and the integral becomes 




R\ r 

L-\ n ( i -» ) 

-a) ul 

-) 

1 

(a \ 

a 

U )• 


du 




H)*J, 

' .<»>• 






Karl Pearson 


249 


where c ( is a function of n, m and t. For the series I have not been able to find 
any generative function. The graph of the curve of range-frequency could only be 
constructed with much labour from this formula. Consequently I have had to fell 
back on the determination of the /3’s of the range distribution, and from them take 
the corresponding frequency curve. This can afterwards be compared for a certain 
number of selected points with the results obtained from (cix) by quadratures. 

N fa\ 8 

For the curve y = ^ l-J , we have by the formula (lxxiii), assuming a ** 1, 

*i={a*} = 0002,2893,7729, * u = to} = 6875, 

V =<r u -*i = •6872,7106,2271, 

{a* 2 } - -0000,0283,5142, to*u) = -0001,7442,8746, tor*} - -5238,0952,8746, 
giving fii' = {(<r u — * -5234,6350,1460. 

Again, to 3 } = 0000,0012,9430, to 2 } = -4230,7692,3077, 
toto} = -0000,0228,9923, to-V} = ‘0001,4088,4756, 
giving /**' = {(.r,i - «i) 3 } = 4226,6100,9148. 


corresponding, after change of a for 6, and after ohange of origin, with the value given in Bitmetrika , 
Vol. xxm. p. 898. This is the range curve in the case of a rectangular population. 

Another fairly easy solution can be obtained for this range frequency curve in the special case of 

the curve Here m — *5, and consequently the range has for its frequency distribution 

fix, f 

U) l 1 -;) TTTrrW’ 


Take 


V a 1 




%'■ ?• 
V 1 -!” 




Then 

and for the limits of the integral we have 


1 + 
2 


V 7 ';(i.V'”, 


m= 0, is 3*1, and « = 1, w? — - 


Henoe 


■V-f 

rx **•-*- 


=u? 0 , say. 


JVn (n- 
2a (ft 




r. .-2S=S((^-(*V-n . •* 

a curve of range a, and easily graphed. 




250 


Rank- Variates and Rank-Intervals 


For the fourth moment 

{#i 4 } « 0000,0001,1810, {<r,W} = 0000,0192,0580, {<e u 4 } --3548,3870,9677, 

= 0000,0010,8554 s , faV} = -0001,1816,1408 s , 
and accordingly n* — {(iru — tfi) 4 } = -3543,7716,5114. 

Lastly for the fifth moment 

{V} * -0000,0000,0208, {*Mi} = 0000,0001,0170, faW} - -0000,0009,3477 s , 
='0000,0165,3833, {aW}=-0001,0175,0102, {«u 5 }=-3056,5555,5556, 

whence /V = {(a?u — iri) 5 } = -3050,6287,5781. 


Transferring to the mean, we find in the usual way 

^ = 0511,2198,8425, fi» = - -0073,6963,4, 
fH = 0066,4817,9, /** = - ’0022,6303,7. 

And thus we obtain, on reinstating a*, 

tr JS = -22()],0l72«, ft = -406,5076, ft = 2-543,825- 

If we consider what form a Pearson curve would take with these values of ft, ft, 
we find it to be of Type I, a limited range curve, and its equation is 

/ x \1-2M,293 / x yUOO.OSa 

y = y# V 1 + -999,389a) l 1 “ -OOO^j .^ Cxn)- 


The range is thus 999,915a, only dififering in the fifth place of decimals from a, 
and its mean is at ‘687,432a, while the true mean is at ‘687,271a. It would seem 
accordingly that we should obtain as good a fit by determining our curve from 
mean, range, and standard deviation, as by using the first four moments. The 
method of fitting is then very easyf. We have 




1 * 202,187 


1 - 


\ 002,061 


■001,7114; 


...(cxiii). 


The constants clearly do not differ much from those of (cxii), but a better com¬ 
parison can be obtained by comparing the moment coefficients about the end of 
the range a. 



Actual Values 

Curve (cxii) 

Curve (cxiii) 

Mean (jii) 

•687,271a 

•687,432a 

•687,271a 

Range 

a 

•999,915a 

a 

Ht 

•5234,6350a* 

•5234,6350a* 

•5234,6350a* 

H* 

•4226,6101a* 

•4226,6101a* 

•4226,7231a* 

Hi 

•3543,7717a 4 

•3543,7717a 4 

•3544,0563a 4 

H* 

•3050,6288a s 

•3051,4018a® 

"3051,1026a 5 


The differences are practically very slight, but the curve (cxiii) is nearer in three of 
the seven quantities tabled to the true values than the curve (cxii), and (cxii) is only 


* The reader must remember that the relation between the <r of the parent population and a is given 
by (lvii), or a=<r 1*2 s/ll = <r x 8*9799,4976, or <r=*2512,5945a, and 0 ^= *8998,7849<r, or about nine-tenthe 
of the parent population standard deviation, 
t See Phil . Trans. Vol. 186± pp. 870-371. 




Karl Pearson 


251 


nearer in two of the constants. The relative ease with which it can be computed 
suggests that it is the better method of fitting the distribution of range. How does 
it compare, however, with the true curve (cix) ? 

I owe to my colleague, Mr E. 0. Fieler, the system of points obtained by 
quadrature for the curve in (cix), and to ray colleague, Miss Brenda Stoessiger, the 
ordinates for plotting (cxii). See Diagram III, p. 252. The elasticity of Type I is 
remarkable; it is here almost a triangle; but the poitots of the integral curve (cix) 
to the third decimal show agreement with (cxii), and it is not feasible to indicate 
any divergence even on the original diagram, which measured 10" x 4*5". This is 
very good evidence that the range curve may be adequately obtained from range, 
mean, and standard deviation, without using the troublesome quadratures necessary 
in the case of (cix). The accordance is especially noteworthy considering the small¬ 
ness of the sample. 

We may now proceed to test how far it is feasible to obtain the frequency- 
distribution of any rank-interval, i q ^ y between the /th and qth ranks. We will 
simply use the letter i for brevity during the analysis, i.e. r a . The chance 

of a value of a x falling in da x> and a value of a x > in da x > 


^ I 

- iir<? i j o.- ^ <<v - i - «,)*< 

Hence in the case of the curve y — y 0 (x/a)~ m , which gives a x = (x/a) l ~ m and 

da x — the chance of x q lying in dx„ and x^ in dx^ 

_ _ n ! ( 1 — mf _/<r 9 \« //ayy-™ _ /x v \ 1 - m \«'~ 9 ~ 1 

~ a*(q-l)\(q’-q-ij\(n-q')\\a) \\a/ \a/ / 

m irr&r**" 


Let us write x^ =* x q + i and change our variables to x q and t, then the value 
of i remaining constant, we need to integrate x Q from 0 to a — i in order to obtain 
the distribution curve of i. We will write xja = x', ija — i' and 

— n !(1 — m )* 

X ~ a(q-i)\ (q'-q-iy.Xn-q')\‘, 

Then dropping the di f we have, for the distribution curve of i\ 

y**xf (W + i'f"™ - (1 -(#' + t / ) 1 “ n, ) n ”« / (x* H- i')~ m dx\ 

Jo 


Write x* = i'zj( 1 — z), and we find 

y = yi *<*' ft—«)—1 I zQ (l-m)-l \± £_ l _ 

Jo 


(1 -z) n (1 “ m) 


((1 — zf~ m — %' 


.(cxiv). 


This is the curve of distribution for the interrank interval i Q ^ = ai\ It does not 
seem feasible to reduce it substantially in form except for special values of q 
and q'\ for example, in the case of the total range when q « 1, q '» n; see p. 248 
above. 





mtuwcY cum ten mens in sampus of n draw from ai 


252 


Rank- Variates and Rank-Intervals 



DIAGRAM III. 






Karl Pearson 


263 


Let «/»_,•-J o e* 1 a-m ) ((*■ ~ *) l_m “ (t') 1 * m ) B “ 4 ' dz, 

then * 0 when i — 1, and * 0 when s = 1 — i\ as long as p > 1, 

We can now consider the moment coefficients of (cxiv) about the end of the 
range a of the parent population, remembering that i = at*. 

We have /i/ = / yi 9 di = x aH ’ t J cK', 

or, putting i' (1 “ m> = t;, 

, /*«' = a* 1 //^ w'n^dv, 


where w 


' -f 
"-'-Jo 


^ (1—tit)—1 

~ (1 - *)" d-mi 


((1 — zf- m - y) n-fl/ 


Integrate by parts and we find the part outside the integral vanishes when 
v = 0 and t/ = l. 


i) (* /t|J - ^ 

r 1 — TO v ' I , , s dv 


Hence 


C?W/ 

Now consists of two parts, that arising from the differentiation of the 

upper limit, and that from the differentiation of v in the integrand. The former 
vanishes, and the latter is — (n - q) Thus we have 




— a' f 1 n'j* 


f dx'. 

Jo 


We can continuously repeat this process until the power of ((1 *- zY~ m - v) in 
the 2 -integral is exhausted, and we then find 

ft» = - «" +1 -—-——2^1---- f v n ~ u i^> w 0 dv. 

( ? ' + r^s.)( ? ' +1+ i^) ■(“- l+ irs)" 


f V n ~\-mw 0 dv. 

I 0 


We now once more integrate by parts, but the integrand in w 0 no longer 
contain v, only the upper limit is a function of v. The part between limits 


“ £ 11 
Jo 


still vanishes at both limits, and thus we reach 


r(g'+ 5 -^—)r(»-j'+i) . .... 

1 - M r^+i+j^-) J " v *•> 


where 


f 1 -«*“ m jgQ (1—Wl)—1 
0 ( 1 - ' 



Ranh-Variates and Ranh-Intervals 


and accordingly 

m 1 

dwo 1 v l ~ m (1 — 
dv 1 — m v n 

Thus we have 


(1 — \ 


- (1 -^ r(„ + 1 + r v) 


x (1 — (1 — v l -™) L - m y i '- q - l dv. 


: = (1 — 


du ttc — v^~~ m (1 — v 1 "™)*™ dv, 
and when v = 0, u■= 1, and v «= 1, u = 0. Thus we have 


/*• “/i 


r («'+1~;,) r (» - >/ + 1 )., 

,1^-^— —i—=-r- 

(1 -’^ r, + i +r i-\ 

\ 1 - «l/ 


Substituting the value of we have 

, . r(«+i) r ( ? ' + fT^) 

H, =a*— -= 


■(»+i + r iL.) r( 9 >rtf-*> 

\ 1-m/ 


r(g)r ( g , -y) * 

r (?')”• 1! 


r (? + T^) r ^-^ 

r (?'+ 1 i^) 


»(*-!) 


r (? + dy r <» , -'> 

r (^) 


Or, symbolically, 

/i f — a p, / /v 


«<s-l)(«-2) r ( ,+ 1-m) r<5 ?) 

r (,■+>) r(, + °) 


rtf) r (. +lt • )' , F(,' + ») 

\ 1-m/ V 1 — m/ 


,(cxv bl ’). 


If « = 0, we have 
*/ = l- 
If s = 1: 

, r(a + l) 


r (“ +1+ ^) v r(!,) 


r (»' + s) r (» + s) 


(cxvi). 







Karl Pearson 


265 


If « == 2: 


A*» ” 


r(w-t-i) 


4 * + r?=) 


r (” +1+ r^)V~ T¥) r (* + i=5) 


TW 


r(q+-~-y 
+—P(5)—I 


If 5 =* 3 : 


/ a I v 1 — M/ 0 \ 1 — W/ \* 1 — W/ 

““ 4 ^ 41)1 r<?,) 


r(w+i) 


If * = 4: 


4 »' + 4 s) 

„4* + rrs) r (* + r§5) 4+i4)' 
4 ' + rbd 


TW 


1 w 


A w/ 


t 


.. .(cxviii). 


= 


r(»+D / r ( ?,+ r^) „ r { q,+ T^) r { q+ ih^) 

rl 

r (^ + T^) r (^Tr^) 




r(«> 


+ 6 


-4 


and so on. 


(^) rw 

r (»' + l4) r (»« r y , r ( ! + T^) 

4 ^) 


r (?) 


r(g) 


I .. .(cxix), 


The complete T-functions can be found by aid of the tables and thus the 
moment coefficients about 0 calculated. These can be transferred to the 
mean, and a^ t fix and ascertained and then a Pearson curve fitted. There is 
little doubt that it will give as satisfactory results as curves thus deduced for 
the entire range. 

It is clear that the forms we get for /is are precisely what we obtain by 
expanding (x^ — x^ B and using (lxxix). The only interest of the investigation is 
the discovery of the intractable distribution curve (cxiv), and the transition to 
the manageable moments of that curve. 


Illustration II. Type VIII. 

Curve (ii): y 


N la 
2aV w 9 


a ** ’2981,4240a, 1-428,571, fit - 2-142,867. 




256 


Rank-Variates and Rank-Intervals 


This curve is given in Fig. II. 



Here by (lxxvi) and (lxxvii) we have ratio of first to second prizes 21 to 10, 
and ratio of first and second penalties 5 to 3. It will thus appear that the Galton 
Ratios change considerably with the nature of the parent population, e.g. 


Parent Population 

N l a \ 8 
^ “ 5a W 
N/a \ 6 

^2aU 


Prize liatio 
24 to 10 

21 to 10 


Penalty Ratio 

13 to 10 (4 to 3) 
17 to 10 (5 to 3) 


N 


Rectangle y « 

Exponential Curve 
Normal Curve 


20 to 10 20 to 10 (6 to 3) 

30 to 10 19 to 10 

circa 26 to 10 26 to 10 

Further, the penalty ratio* is less than the prize ratio in all cases, where there 
is lack of symmetry. 

We have the following further results: 

Standard Deviation of 

Mean Rank Variates Rank Variates 


x x = -0128,2051,28a, 
£, = -0384,6153,85a, 


&‘g — 

= 

X S « 
*= 
ai 7 * 
* 
x 9 — 
Xio- 
Wu~ 


•0769,2307,69a, 

•1282,0512,82a, 

•1923,0769,23a, 

■2692,3076,92a, 

•3589,7435,90a, 

•4615,3846,15a, 

•5769,2307,69a, 

•7051,2820,51a, 

•8461,5384,61a, 




0238,3768,41a, 
0467,3022,28a, 
0712,1693,08a, 
0959,3993,30a, 
1195,8178,27a, 
1407,9234,79a, 
1580,6189,75a, 
1695,3020,47a, 
1726,1844,04a, 
1631,0652,40a, 
<^= 1317,4597,490. 


“*» - 
= 


* The penalty ratio ie taken from the medioority end of the akew onrvee, or from thoae leee than medioority. 





Karl Pearson 


257 


We see in the case of this curve the absolute variability reaching a maximum 
for the variability of the 9th of the eleven ranks, whereas in the previous case it 
reached its maximum in the tenth rank only. On the other hand, the relative 
variation decreases from the lowest to the highest rank and, although the size of 
the sample is the same, is much less than in Curve (i). Thus: 

= 185*934, F 8 = 38*521, 

F a =« 121*149, F e = 29*921, 

F,* 92*582, Fio — 23*131, 

F 4 » 74*833, F u = 15*570. 

Passing to the Rank-Intervals we have 

— 5?i - 0256,4102,57a, <r -- 0360,0194,10a, 


x* —x 1 - 

£3 — = 

“ #3 * 
#5 — #4 = 

#6 — #5 = 


X io ~ #» — 
#11 — #10 = 


0384,6153,84a, 

0512,8205,13a, 

0641,0256,41a, 

0769,2307,69a, 

0897,4358,97a, 

1025,6410,25a, 

1153,8461,54a, 

1282,0512,82a, 

1410,2564,11a, 


= '0360,0194,10a, 
= 0467,3022,28a, 
=> 0568,4150,84a, 
r Xi _ Xi = -0666,1733,88a, 
r x.-aj 5 - 0769,2307,69a, 
= 0856,1952,75a, 
Tui- 0 , ='0949,5590,75a, 
= -1042,2194,95a, 
= -1134,3489,07a, 
r *n-*i« = ‘1226,0664,49a. 


Now a very remarkable result flows from the intervals, the difference of the 
intervals—the second difference of the rank variates is constant = *0128,2051,28. 
Is this peculiar to this curve, or a more general result ? A very brief investigation 
shows that it is not a general principle, but the fact for this case suggested an 
inquiry into the differences of the mean variates of ranks. Returning to the formula 
(lxxiii) we have 

. _ .ro + n fH + ' + ii) r (« + i^s) N > 

X#+l ^r/.xu M\ r( ff +l) r (q) )' 


aT (n 4-1) 

r (” +1 + ny 


aT (n +1) 


( ?+ r^r.) 


rhi) 


r(? + i) 


& t Xn = 


aT(w+ 1) 


and continuing 


r (“ +1 + rrs) r<! 

r(n + l) r ( 9 + r-m) 


^ / L _ o^ (—^ — i^ 

T( $ + 2) \l-m ) U-w L )' 


4* 1 + 


rk) 


r (?+*) 


(l — m) (l — m O**’(l-m 1) ) 


Biometrika xxiv 






258 


Rank-Variates and Rank-Intervals 


Therefore when s = 1 + ^-, the $th difference of the mean rank-variates—the 

s — 1 difference of the mean rank-intervals—will vanish. When m ** *5, s =* 3, or the 
third mean rank-variate difference, the second mean rank-interval difference is 
zero, as we have just noted. When m = 0, A**r ff = 0, or for a rectangular parent 
population all the mean rank-intervals are equal. Again, if m = ’8, the sixth mean 
rank-variate differences are zero, or the fourth mean interrank interval differences 
are constant. Similar properties hold for the differences of the (n, q) moment 
coefficients, but it is doubtful whether they would be of service for computing purposes. 

Returning to the results of the rank-intervals on p. 257 we see a marked 
change in the ratio of the first to the last; it now lies between 6 and 7 instead of 
amounting to 200 , as in the case of curve (i); thus there is an approach to the 
equality of rank-intervals, which comes with the rectangle. A similar remark 
applies to the absolute variabilities of the last and first intervals. For curve (i) 
their ratio was 39*5, while for the present curve it is only 3*4. 

Our results for and <r^ aq enable us to obtain the following system of 
correlations: 

r*,* =‘6534,5026, r*,* = *4979,2960, 

r*,* -7620,0077, r « 1><84 - 4024,7170, 

r mttk = 8082,9038, . 

. r*,* =*2368,1276, 

?*,* = 8342,1006, r Xltffs = 1975,5159, 

= *8145,7314, r* lia , - 1609,2022, 

- -7730,9696, r * lt , 10 = *1244,0692, 

r*..*!- 6731,0804, r*,*.- *0837,3929. 

These values, all found directly, confirm the general theory that if q < q < q'\ then 
*V' **W r «V'* We see further that the correlation between the variates of adjacent 
ranks reaches its maximum not at the terminals but towards the centre of the 
system. The second column indicates the reduction in association as our ranks are 
farther and farther apart—the variates of the first and second ranks have almost 
eight times as close an association as those of the first and last ranks. Lastly, we 
may note that the tendency for the correlations in the first column to become a 
symmetrical system indicates that our curve is approaching the rectangular parent 
population. 

We have the following system of correlations for interrank distances and inter¬ 
rank distances with rank-variates: 


**-*i, *-* = *1395,6901, 
* 1 , 0 : 4 -a* == ‘0734,3468, 
= ‘0848,6342, 


r *, ■* ‘6593,8048, 

*•*.*-*-•1118,6920, 
33 ‘7620,0076, 


=-‘0068,5631 (shows change 


***-*,—— ’0549,4837, 
>*-*,*-* --‘0725,9148, 
*W-«i, *10-09 = *0373,9443, 
***-01, ©it-*® 551 —‘1000,0705, 


**-*»*—* = 
7 ’*-*,*-* * 
r *-0«,0io-* = 
r *-*, 011-010 ~ 


of sign occurs about median interval), 

- 0635,0006, = — *0696,0576, 

- 0838,8890, =-*0919,5513, 

- 1009,9575, —‘1107,0675, 

- 1155,7130, ^ 04 -*, 011-010 =* — l266,8378. 






Karl Pearson 


259 


The numerical results indicate at once the following conclusions: 

(a) The correlation even between adjacent intervals is not high; it starts 
positive and rapidly falls, becoming negative, and reaches a negative maximum 
value when the intervals are farthest apart. 

(b) A rank-variate and an interval below it are highly, but a rank-variate and 
an interval above it, i.e. on the side of the tail, are not highly correlated even if 
adjacent. 

(c ) The change of sign in the interval correlation corresponds to a change of 
sign in the correlation of the rank-variate with an interval above it. 

(d) It is clear from the numerical values that 

r xr-xi,X4-a>* “ r xt,g>g-xi * r xt .* 4 - 0 *> r x%~x%,xi~xt~ r x%,x%-x% x r x§,xi-x »> 

This is a general theorem, quite independent of our special parent population, and 
it may be proved as follows. 

If q < q f < q " < q"' < y iv , let us consider 


T X<f-Xq , 


B (fa ~ *q) fa : - vO) z fa z gg) fa* ~ jgr) 

a Xq'-Xq a Xq\r-Xtf" 

_ fa^fo) ~ 4- fa ay "} - - fay-} 4- XjXj" - faayr} 4- 


(T Xq t —Xq Gxqto-Wq" 


_ fltyv <r n,j' a 'ccqto d" r 0!g , ay" ^0$', Xq^'^Xq' °"ay" fltytr ° r X q Cr Xq\r 

<T Wq'-Xq <T Xq\y-Xq f " 

r »g' t fl^lv — r OCq* t 00q" r Xq”, Xqtv » ^Xq', Xq'" ~ V Xq*, Xq" T Xq", Xf” » 


But 


’ Xq,Xq'*’ ~ ' Xq, 7 OSg, OSgtv *“ ' Xq ,Xg" ' Xq",Xq[v ’ 

bv formula (ciii). Hence 


_ ( a Xq' r xq' t oeq" ^Xq , £»j") (°"atyv a^iv r a^", ay") , . v 

r Xq'-Xq,XglY-Xq" .(CXX1J 


^Xq'-Xq O’Xqiv-Xq"' 


_ ? ay, fly ' r X q t Xq") fagg" ^Xgiv r Xq», XqW Cr Xq'' a X q' "T‘xq", Xq'") 

• 'q t>(T Xq -Xq X ^Xq"—Xq"' 

_ {ay-ay} — ayay- — fafa- + fffa - x fait ay-} - ayrjy- - {ay»ay>} -f a^ay* 




<>V 


= fa fa - #g)} - V fa - gg) x fa- (ay, - ay»)} - V fa - ay-) 


*V O'xq'-Xq 


* Xq" ( <T Xgiv-Xq'") 


- r «y,« } '- ? * r . 

We thus find that the partial correlation 


.(cxxii). 


7 Ofy'_^, Xqiy-Xf'* • Xq , ‘ 




or, if we fix the variate of any rank </", then the correlation between any two inter¬ 
rank intervals, one on either side of y", is zero. This clearly follows from the 
general principle, that fixing the variate of any rank is equivalent to dividing the 
parental population into two portions and then taking at random y” — 1 individuals 
from the first portion and n — y" from the second portion; the distribution of these 
two sets of individuals must be completely independent. 


17-2 





260 Rank - Variates and Rank-Intervals 


Formula (cxxii) is a convenient form for testing the accuracy with which any 
interrank distance correlation has been determined. If we remember that any 
interrank distance variance can be written as 


and that 


u fy-ofy ' 
r = 


0 * 1 *, + cr\* - ^ r tta^ (T ^ (T ^ 


tOqttq''' (Cq^Xq' 

®V* *9' ' ~~ <r *f V **' *9 ’ 


Xq"—SCq' 


.(cxxiii), 


it will be clear that the whole system of correlations between the rank-variates and 
the interrank distances can be reduced by aid of formulae (ciii), (cxxii) and (cxxiii) 
to the calculation of the ar^s and the r^^’ s. The work can thus be very much 
simplified. Ultimately it depends on a knowledge of {#„}, {#**}, {#*#*+1}* throughout 
the whole n ranks of the sample. And this is true whatever be the character of the 
parental population, supposed continuous. 


Illustration III 

Rectangular Parent Populations. Curve (iii). The change from the asymptotic 
curves is indicated in Fig. III. y = N/b, a = '288,6756, 0 , /3% = 1'8. The theory 

has been given in Part I of this paper. 

fig: m 


y*ioo 


100 


X 


X 


•4 .6 

a 


J’Q 


Additional formulae which may be of service are 

/W~q)(n+ 1-?' + ?) b 
V n + 2 . n +1 ’ 

which, if q' = q 4* 1, gives _ 

/ n b 

n + 2 n~+T’ 

a value independent of q , and if q < q' < q" < q"', 


r = _ / (q'-q )( q"'-q ") 

V (n + 1 - q' + q) (n + 1 - <f - q") ’ 

or if q f « q +1, q " f » q " +1, 

1 

H -ttg" ** ^ : 

a quantity independent of both q and q\ or all rank-intervals have the same 
negative correlation. 

* It is not indeed needful in the oases under consideration to find {x Q xA, for r^- by 

aqxq X ^J 

(ovii), a result whioh shows us that the relative variability always must decrease with increasing rank. 






Karl Pearson 


261 


In the case of samples of 11, 

*0766,5551,766, 
r SOq+l-SOq,Otlq"+l-ia 9 ~ “ *0909,0909,09. 

Comparing the first of these results with those on p. 256, a being the total range 
of curve and b of rectangle, we see that the result is obtained by the middle 
interval (a% - <r 5 ) almost retaining its variability while those below creep up to its 
value, and those above fall down to it. 

Comparing the second result with those on p. 258, we note that the negative 
correlation has spread right through the whole system, and then the values have 
been equalised as we pass from the highest to the lowest at a value less than the 
highest. t ■ 


Further, if q" >q' >q, 


if 


if, in addition, q* q «f 1. 


Again, 

if ?' = ? + 


/(q' -q){n + \-q") _ 

^ V q"(n + l-q' + q) ’ 

/( q '- q )(n+l-q > ) 

V“ V q ’( n +l-q' + q) ’ 

_ / «~g 

V « (y + 1) ’ 

= _ / 9 ~ 9) 

V (n +1 — <jf) (n +1 — ’ 


*0+1.Oy+i- 


= _ /_i_ 

f V « (n + 1 


g)‘ 


' Xq,Xq+\-»q- 

Hence we find for w = 11, 

= -6741,9986, --1348,3997, 

- -5222,3297, - - -1348,3997, 


-5222,3297, r, 


*1* ®4~®> 


- 1740,7766, 
i xt-m - *4264,0143, = -1740,7766, 




:-4264,0143, *2132,0072, 

= *4264,0143, - - *2132,0072. 

We see at once from these results that the correlation of the variate of any rank 
with any rank-interval below it is constant. This follows at once, if we put y-h 1, 
when we find 

. /n + i-Y 

*V >(8g+ i-a*~V ng" ’ 

which is independent of q . 

A similar property holds for the correlation of the variate of any rank with any 
interval above it; for 

■v; 


'•j.'V'-'V 


g(< f-1) , 

(» +1 - q) (n +1 - g' + g ) 



262 


Rank- Variates and Rank-Intervals 


Hence, if q" ■* <f + 1, 

■ = - J ~ + l- q) * 

which is independent of g'. 

This relation is also obvious in the above table of the correlations. 

Clearly, the correlation of variate of gth rank with any interval below it x cor¬ 
relation of variate of gth rank with any other interval above it = — - = correlation 
of the first interval with the second interval, e.g. 

r «*. **-*! X (B|) 0|—(&t 9 


, C 04 — W% X ? OBi, «§—~ lV 5=8 ’ 


This is a special case of the theorem already stated (see p. 259) emphasised by 
the fact that here the correlation of any pair of intervals = — - = — -fa. 

Turning from the correlations to the actual distribution of rank-variates, we 
have the mean variates of the successive ranks distributed at -fab apart in the case 
of samples of 11 starting at distance ^ 6 from the start of the range. We have the 
following symmetrical system of standard deviations: 

** = -0766,5551,766 = ^, 

** - 1033,6227,886 = ^, 

<^= 1201,2813,676 = ^, 

** = *1307,4409,016 = cr^, 

^ = •1367,3544,246 = ^, 

<^ = •1386,7504,916. 

The variation rises from a minimum at the first and last ranks, and afterwards 
more slowly reaches a maximum at the median. Since x bi x 6 and ^ have the 
highest variation, it might be expected that their intervals would have, but this 
high variation is accompanied by a high correlation, which causes the variation of 
all intervals to be the same. 

This can be shewn on the rank-variate correlations. Here we have* 

V*-**41,9986-r^, 

^ot.asi — *7745,9667 = 

= *8164,9658 = 

^,» 5 = *8366,6003 

= *8451,5425 = 7 ^. 

The rank-variates and rank-intervals when we sample from a rectangular popula¬ 
tion being so easy to analyse, it is somewhat sad that such parental distributions 
are so rare as to be of little practical importance. 


/a n+1 -q' 

= V ^ - i see JBiometrika , Vol. 


xxm, p. 391 (xxvii). 



Karl Pearson 


268 


Illustration IV. Type IX. 

Curve (iv): y m 110 
Range from x m 0 to x = a. 

«r = ax-2836,5876, -006,706, ft-1-831,616. 

This curve is drawn in Fig. IV, and indicates the rectangle changing tc> the 
sloping straight line, there is here no asymptote, but still a finite range, and finite 
ordinate at a. As the power increases the ordinate at a becomes greater, and the 

fig: is: 



0 *2 *4 *6 *8 1 * 


curvature changes from concave to convex at the line point, see Figs. VI, VII, and 
VIII, below. Ultimately the range becomes infinite, while the ordinate remains 
finite at the Exponential Point. 

These curves fall under Type IX with the formulae for samples of 11: 

^roi) r( g + i?) 

* T(q) r(12 + *f)’ 
r(i2) r( g -n + A) 

l ql r(g) r(i:f-hA) ' 

, _ r(i 2 > r(</ , + i+A)r<?+ffi 

* “ r7i3 +a) T lq' + if) ' T(q) ■ 

From these results we obtain 

For Curve (iv) For Bectangle 

x x - -1011,3289,73a, -0833,3333,33a, 
xt - -1930,7189,49a, -1666,6666,67a, 

X, m -2808,3184,72a, -2500,0000,00a, 
x t - -3659,3240,69a, -3333,3333,33a, 
x t = -4490,9886,30a, -4166,6666,67a, 
x % = -5307,5320,17a, -5000,0000,00a, 
x 1 =-6111,7035,35a, -5833,3333,33a, 
x t = -6905,4312,67a, ' -6666,6666,67a, 
x t - -7690,1393,66a, *7500,0000,00a, 

®io= -8466,9211,20a, -8333,3333,33a, 
x n « -9236,6412,21a, -9166,6666,67a. 




264 Rank- Variates and Rank-Intervals 

It will be seen that there has been a general shift of the mean variates of the 
ranks toward the terminal of the range as measured from the origin, the shift 
diminishing though not very rapidly for the higher ranks. 

If we compare the corresponding variabilities 



For Curve (iv) 

For Beotangle 


- -0852,5455,75a, 

•0766,5551,76a, 


= •1096,8465,3 la, 

•1033,6227,88a, 

** 

- 1235,5539,22a, 

•1201,2813,67a, 


= -1314,3786,15a, 

•1307,4409,01a, 


= ■1349,5161,74a, 

•1367,3544,24a, 


= 1347,8513,03a, 

•1386,7504,91a, 


= 1311,4557,68a, 

•1367,3544,24a, 


= 1239,9249,66a, 

•1307,4409,01a, 

% 

= -1127,4126,88a, 

•1201,2813,67a, 

a »io 

= -0961,4827,86a, 

•1033,6227,88a, 

a wu 

= 0707,1504,73a, 

•0706,5551,76a, 


we see that the effect of lessening the frequency at the origin terminal of the 
range is to increase the variability of the first four ranks, and reduce that of the 
remaining seven ranks. Generally in skew limited range curves the variates of 
the ranks towards the end with lesser frequency have higher variabilities than 
those towards that end with greater frequency. 

We next note that while the rank variates have changed considerably in mean 
and variability, their correlation system has remained nearly symmetrical and very 
close to that for the rectangle. The reader will find this to be so by comparing the 
following values with those on p. 262: 


r *i, 


•6739,0962, 

r *io 

>®11 

= •6741,8914, 

r «t, 


•7744,4136, 

r «., 

«10 

= -7745,8181, 

r «,. 

*4 ~ 

•8164,0250, 


®9 

= -8164,7728, 

r «., 


•8365,9743, 

**■». 

®8 

= ‘8367,8372, 


.«• = 

•8451,0996, 


.»7 

= -8452,7165. 


We now turn to the rank-intervals and their standard deviations. Here 
Wt-xx = *0919,3899,76a, = *0818,2571,71a, 

x*-x 2 = *0877,5995,23a, a 9r9g = *0794,1031,13a, 

* *0851,0055,97a, <7^ = *0776,2300,67a, 
xs-xt - 0831,6645,61a, = *0762,1768,17a, 

xt-x b = -0816,5433,87a, = 0750,6506,20a, 

fy-x* **0804,1715,18a, »*0740,4964,31a, 

- 0793,7277,32a, tr^ = *0732,0729,69a, 

Xg — Xg - -0784,7080,99a, «*0725,0892,79a, 

S10- x 9 «-0776,7817,54a, <7*^ - -0718,4942,03a, 
0769,7201,01a, <7^.^= 0712,5524,56a. 







Karl Pearson 


265 


In the case of the rectangle the mean interrank intervals are all 

= ^=- 0833,3333,33a, 

or the effect of reducing the frequency at the start of the range is to increase the 
rank-interval here, and this increase for samples of 11 holds for the first three 
intervals, after which the intervals are reduced* Again, the variability o^ the 
intervals in the rectangle is constant — *0766,5551,76a; hence the lessened terminal 
frequency increases the variability of the intervals up to the third, after which it 
becomes slowly less. 

We may next consider the correlations between adjacent intervals; we have 
,ast-af F **” '1125,2098, r ® l0 -«5», 38 0835,1692, 

885 - 1017,8331, = - *0845,0094, 

88 *0957,7702, 83 ““ *0857,6100, 

^*sc5<-«4, as*—«6 83 ■“ '0919,1912, ,«s-a»7 31 ““ *0873,3809, 

U-*,- 0892,7658. 

All these correlations are for the rectangle - *0909,0909. Thus the effect of 
lessening the frequency at the start of the range is to increase the negative cor¬ 
relation between adjacent interrank intervals at that end, and decrease it at the 
other terminal, in — the power, 0*1 is so small, however, that there are not very great 
differences from the rectangular value. We have none of the positive correlations 
which marked the relation of interrank intervals for curves on the biquadratic above 
the rectangular point. 

Illustration F. Type TX. 

f sc\ O'® 

Curve (v): y = 130 (-J . Range x — 0 to a. 
a =-2728,8953a, & =-049,424, ft- 1-928,072. 

FIG*. 3T 



a 

For this curve we find 

X! --1375,6917,70a, «■* --0998,0335,16a, 
x t =-2433,9162,08a, =-1186,8650,72a, 

x, =-3370,0378,26a, <r m =-1271,1809,12a, 
x t =-4234,1500,89a, = 1303,0265,1 la, 

xt --5048,4097,21a, = -1299,1879,68a, 



266 Rank- Variates and Rank-Intervals 

S, =-5825,0881,40a, a*. - -1266,4953,94a, 

3, *-6571,8943,12a, <r„ = -1207,3010,76a, 

S e =-7294,0805,00a, a„ ='1120,8788,29a, 

S» =-7995,4343,94a, <r„ - -1003,0361,70a, 

St o = -8678,8048,55a, a*,, = 0843,2473,69a, 

% = -9346,4052,29a * 0612,1851,39a. 

Further, 

St —Si =-1058,2244,38a, - 0901,2957,95a, 

S 8 -** = -0936,1216,18a, <>•„_„ - 0830,9455,49a, 

S t -S a =-0864,1122,63a, =-0781,6868,42a, 

S t -St ="0814,2596,32a, = 0744,6268,59a, 

St —Sf = "0776,6784,19a, a^ Xi =-0715,1999,34a, 

S 7 -St =-0746,8061,72a, a*,,*, =-0691,1598,52a, 

<r 8 -® 7 ==-0722,1861,88a, a x ,^ = -0670,8148,41a, 

.r» — ,r H —"0701,3538,94a, a x,-x a — "0653,2942,15a, 

» 10 - = -0683,3704,61a, a*,,.*, = 0637,9610,84a, 

*u - «io - -0667,6003,73a, = 0624,3669,88a. 

The correlations between adjacent segments are 

= " '1389,2856, r Xu _ Xlo<Xln ^ = - -0717,7055, 
n*-*,. - '1131,2660, =- 0738,7633, 

= - -0989,5414, v, (if ,^ = - -0764,8528, 

= - 0901,7566, r Xt _ X " X) _ Xt = - 0798,0314, 

= — "0841,7487. 

These values enable us to build up other correlations, as we have already indicated. 
For example, 

I'tti—xi , «4—i Xt . = 0 = 'f’xt-n , «<-> r xi~xi , xt-xt * r x t -x» ■ > 

and thus = - -1389,2856 x (- 1131,2660) 

-•0157,1652. 

Again, r x%—xi,xt—xi. te 4 —«»> — 6, 

0r r Xi-Xl , *j-X* “ r Xt-Xt , X)-Xt * r X(-Xt , Xt~Xf 

= -0157,1652 x (- -0989,5414) 

= - 0015,5521, 

which sufficiently indicate the alternating in sign of the correlation of successive 
intervals with a given interval, and the rapid reduction in intensity of the 
association. 

Generally we note that the whole system of rank-variates is shifting away from 
the curtailed end of the range, that the intervals are becoming larger at this end 
and smaller at the other, while the variability of both rank-variates and intervals 
is becoming larger at the same end and smaller at the other also. 






Karl Pkarson 


267 


Illustration VI. Type IX. 

The Line-Curve (vi): y = 200 . 

<r = '235,7023a, fa-'- 32, A-2*4. 

fig: m 



a 


«g = 2585,0974,08a, ^ = 1284,7586,25a, 
x* ='3877,6461,12a, * t , =-1276,9210,22a, 
x s = 4847,0576,40a, =-1227,2050,50a, 

=-5654,9005,80a, <r„ 4 ='1164,2305,45a, 
x s =-6361,7631,53a, cr,, = 1092,9941,68a, 
x, = -6997,9394,68a, a* = 1014,3190,83a, 
x, = 7581,1010,90a, = 0927,5395,38a, 

x 8 =-8122,6083,1 la, =-0830,6027,29a, 
x, =-8630,2713,31a, o-„ = 0720,0116,40a, 
x 10 = -9109,7308,49a, <t Xio = 0588,3342,52a, 
x u = -9565,2173,91a, = 0416,2726,63a. 

x* -xj = 1292,5487,04a, <r= 1052,1861,17a, 
x*-x* = 0969,4115,28a, o-^ = -0852,5889,73a, 
x« — xg = "0807,8429,40a, = "0733,3934,65a, 

x 8 — xg = '0706,8625,73a, a*,-*, ='0652,8945,01a, 
x, — xg =‘0636,1763,15a, o-e,-*, = "0594,0163,75a, 
x, -x 6 ='0583,1616,22a, ='0548,5874,26a, 

x 8 -x 7 = 0541,5072,21a, tr^ = 0512,1772,01a, 

^ — x g ='0507,6630,19a, ='0482,1559,27a, 

xio-x, ='0479,4595,18a, = 0456,8534,36a, 

Xu — x 7 g = *0455,4865,42a, a xu—x,o 53 '0435,1521,65a. 

The straight line takes its place in the general series of these curves with no 
simplification in the results, and only a slight lessening of the labour of determining 
them. 




268 


Rank-Variates and Rank-Intervals 


The correlations between successive interrank intervals are 

- -1581,8003, — •0479,4693, 

- 1101,0458, --0508,7161, 

r x t -xt,ws-te t — ~ ’0875,7500, I'xt-xa . *«-»7 —~‘0545,2651, 

*%-*,*-<* =--0742,9493, 7- e ,-.0592,2374, 

^.^ = -•0654,9627. 

This series of correlations still further emphasises the increased negative coefficients 
where the frequency is cut down, and the reduced coefficients at the mode. 

If we find the correlation of a rank-variate with the interval on either of that 
rank, we have the usual rule obeyed, i.e. 

^a^+l-«pg» itj-i “ —l * Bq' 

For example, r Xti = ({# 2 *} - {^W - (ft - ft))/*’*,*®.-*! 

* *4045,29516, 

ft*, a*-*, = ({*W “ W) ~ ft (ft ~ ftWftift,-*, 

= -•3910,22224, 
and accordingly their product 

— 1581,8003 = 


The correlations between adjacent rank-variates are high and positive. Thus 


# ®11 * 


= •6625,9970, 7-^,^ =-6738,5188, and r, 


« 6 « *< ‘ 


•8436,5341, 


which show comparatively little change from the values for y = y 0 ^ : see p. 264. 


Illustration VII . Type IX, 

Curve (vii) : y = 600 



Range a? = 0 to a. 

<r = 1237,1791a, ft = 1*646,091, ft = 4*622,222. 




Karl Pearson 


209 


x i =-6167,2164,29a, «r ei - 1156,1344,27a, 

- -7196,0868,34a, a*, * 0789,0491,38a, 
x t =-7794,6763,20a, <r w - 0696,3680,97a, 
x t - -8227,7138,93a, a* - 0591,3764,57a, 
a* =-8670,5353,06a, - 0511,2186,64a, 

2, --8856,2198,16a, a* --0444,0178,54a, 

3j =-9102,2259,22a, - 0384,1688,84a, 

x 8 =-9318,9455,87a, <r„ = 0328,0948,14a, 
x» =-9513,0902,86a, a*. =-0272,8356,92a, 
x 10 * ‘ -9689,2586,25a, a,,, - 0214,8511,95a, 

Xu - -9850,7462,69a, a tll = 0147,0424,20a. 

The variability of the rank-variate now continuously falls. 

5jj — X\ —"1027,8694,05a, a*,-*, —'0904,5398,74a, 
x 8 -x 2 =-0599,5904,86a, a*,.*, = -0559,8768,70a, 
x t — x 8 —-0433,0375,73a, <r Xl - Xi —"0413,0473,65a, 

S, -xt - 0342,8214,12a, a x ^ Xl --0330,5890,82a, 
x t -x 8 - 0285,6845,10a, <r^ Xh --0277,3303,91a, 

2, -x t =-0246,0061,06a, a*,.*, =-0239,8827,45a, 

Xg —Xi —"0216,7196,65a, —"0212,0056,16a, 

x 9 -x s =-0194,1447,00a, - 0190,3818,39a, 

x w -x t - 0176,1683,39a, --0173,0804,87a, 

xu — *xo = "0161,4876,44a, —-0158,8976,36a. 

The approach to equality between the rank-intervals and their standard devia¬ 
tions is becoming sensible. This equality characterises the exponential curve. 

Next, taking the correlations of adjacent intervals, we find 

-0779,3029, r^,« u _* w --0164,5761, 

—-0506,2378, --0180,3372, 

r an-xt,x,-xt --0380,8816, --0200,0923, 

0307,8122, =- 0225,5633, 

= - 025.9,6816. 

The correlations of the interrank intervals are clearly growing smaller and 
smaller, although the first two intervals have still a considerably higher correlation 
than the last two. 

Illustration VIII. The last curve I propose to take to illustrate Type IX is 
Curve (viii): y —1200 ^ . See Fig. VIII, p. 270. 

<r = -0772,7776a, & = 2-509,630, & = 6154,167. 


Here 







270 


Rarik- Variates and Rank-Intervals 



The mean rank-variates and their variabilities arc given by 


x t = 7815,5832,80a, 
x 2 = -8466,8818,87a, 
x 9 = -8819,6686,32a, 
.r* = -9064,6594,28a, 
*5 * -9253,5064,99a, 
x 9 = -9407,7316,07a, 
x 7 =-9538,3945,46a, 
= 9651,9468,62a, 
x » =-9752,4879,76a, 
= '9842,7887,90a, 
* u = -9924,8120,30a, 
For the interrank intervals we have 


a Xl = 0767,3475,60a, 
<r Xt = 0512,6104,27a, 
= 0400,2604,35a, 
= 0330,2853,39a, 
cr x , = 0279,2320,31 a, 
cr*. = 0238,2940,64a, 
= 0203,1961,98a, 
<r», = 0171,3990,39a, 
= 0141,0005,29a, 
= 0109,9776,40a, 
<r Xil = 0074,6247,54a. 


x t -Xx = 0651,2986,07a, 
x 9 -x 9 = 0352,7867,45a, 
S 4 — « 8 = -0244,9907,95a, 
— =‘0188,8470,71a, 

W* -8b ='0154,2261,08a, 


o'asg-in = '0605,3750,06a, 
a*,.*, = 0339,6729,41a, 
- 0238,7810,83a, 
cr !B4 _ (04 = '0185,2035,02a, 
= 0151,8161,46a, 





Karl Pbarson 


2Y1 


w 7 - x t --01^0,6629,39a, cr*,.* = 0128,9444,19a, 
x % — «7 ='0113,5523,16a, = *0112,2599,78a, 

-2% =*0100,5411,13a, <r^ * 0099,5309,10a, 

#io —#9 388 ’0090,3008,15a, = 0089,4874,05a, 

- «jo * '0082,0232,40a, * '0081,3527,77a. 

Here is even a closer approach to the equality of the interrank interval and its 
variability. For the correlations of the successive rank-intervals we find 

r *r-»i, era—®* —— *0406,4928, , xii-zio '0037,4562, 

r xt-wt> ="" ’0263,4168, 555 — *0055,0072, 

— '0193,5992, ^aJa-a>y, = — '0073,7364, 

= -*0150,1790, --*0094.5803, 

— — *0119,1094. 

With regard to the interrank interval correlations we see that they are still 
sensibly negative and very small. The surprising part is that so little correlation 
should exist between even adjacent rank-intervals, and how close in magnitude 
the interval and its standard deviation are, right away from the rectangular to the 
exponential population. 

We now reach the Exponential Curve itself. Here the range becomes unlimited 
in one direction, and with the following curves a is not a physical character of the 
distribution, it denotes the distance from the origin to the start of the curve, and 
we shall replace it in our examples by its value in terms of the standard deviation. 

Illustration IX. Type X . This is the Exponential Curve: 

Curve (ix): y = — e~ x l r , (& = 4, & = 9), 
a 

the transition curve from Type IX with finite range to Type XI with infinite range. 
This type has been fully discussed theoretically in the first section of the present 
paper. I add a graph here for the purpose of comparison with adjacent curves*. 

fig: ix 

[joo| y=408-25e 40825K 


O 2 4 6 8 10 12 14 16 

a 

* For the purpose of plotting <r was taken=2-449,4897; the graph is therefore directly comparable 
with the last curve (y = 000 (10/a*) 7 ) of our series. 




272 Rank- Variates and Rank-Intervals 

To cariy out the sequence of our curves we give the results for a sample of 11 taken 
from this exponential curve. We have 

x x m -0909,0909,09a, a Xl = 0909,0909,09a, 

m -1909,0909,09a, a„ = 1351,4607,95a, 

x, - -3020,2020,200-, a*, = -1749,5754,29o-, 

x t = -4270,2020,20a, a l4 = 2150,2358,43o-, 

5* = -5698,7734,49o, a*. - 2581,5364,63a, 

.r, = -7365,4401,15o-, a*. = -3072,8013,74a, 

55j = -9365,4401,15a, a*, = 3666,3480,86a, 

x s =1 1865,4401,15a, a* = -4437,5791,02a, 

x„ = 1-5198,7734,49a, a*. = -5550,0648,10a, 

£«.= 20198,7734,49a, a xi0 = -7470,1552,46a, 

*n = 3-0198,7734,49a, a xu = 1 2482,1159,82a. 

Now these results cannot be compared directly with those on p. 270 for the curve 
/®\ u 

y = yo(~J • We have first to notice that the rank-variates are measured in the 

present curve from the maximum frequency end of the range, and are in terms of 
the range a and not the standard deviation a. Accordingly for comparison we must 
subtract each rank-variate from the range a, renumber them in reverse order, and 
substitute 14"0416,0480a for a. We then find for y = y 0 (x/a) vl the two series 

- -1055,7597,60a, a*,. = -1047,8513,04a, 

Sit = -2207,4976,81a, a*,- = -1544,2625,58a, 

5-:,' = -3475,4660,24a, a„. = -1979,8737,05a, 

*4 = -4887,2246,13a, a Xl - = -2406,7175,69a, 

- -6481,6813,59a, a w . = -2853,2007,09a, 

x* = -8316,3987,10a, a„. = -3346,0310,73a, 

= 1-0481,9667,27a, a*,- = -3920,8658,27a, 

V =1-3133,6826,65a, a* = -4637,7362,01a, 

*»' = 1-6573,4660,25a, a*,- = -5620,2988,45a, 

* M ' = 2-1527,4386,40a, <r XlQ - = -7197,8730,32a, 

= 3-0672,7163,Ola, a m - = 10774,7911,82a. 

These values, although not very close to those above—for the curve with 11 as 
power is only a very rough approach to the infinity of the exponential—yet suffice 
to indicate that the rank-variates and their standard deviations are approaching 
the exponential values. 




Karl Pearson 


278 


Turning now to the interrank intervals we find 


Xg —Xi « 

•1000,0000,00a, 


#3 — X% a 

•1111,1111, 11a, 


#4 — 5?0 — 

•1250,0000,00a, 

~ ®g> 

& 

1 

II 

1428,5714,29a, 

°* ^asg— «*> 

#6 — #6 = 

•1666,6666,67a, 

“ i 

X*j *“ = 

•2000,0000,00a, 

** » 

Xg X*j = 

•2500,0000,00<r, 


” Xg 

•3333,3333,33a, 

5=5 » 

I 1 

1 

II 

•5000,0000,00<r, 


X || X\Q * 

10000,0000,00a, 

38 


The corresponding values for y = (x/a) 11 are 

x t ' -*i = 1151,7379,20<r, = 1142,3235,44<r, 

x% -x t ' ='1267,9683,57<r, <t„_^ = 1256,5467,76<r, 

Xi' -xi =-1411,7585,75<r, = 1397,5737,04<r, 

x t ' -x t ' = •1594,4567,45<r, = 1576,3102,46a, 

x» -xt' = 1884,7173,51o-, =-1810,5865,73o-, 

x,' -x t ' = 2165,5680,17a, - 2131,7423,24a, 

x H ' -x 7 ' =-2651,7159,39o-, o-^.^ = 2600,5543,83o-, 
x t ' —x^ = -3440,0639,23o-, o-^_„. = 3352,8780,26a, 
x w '-x t ' = -4953,6920,52o, = *4769,5531,99a, 

xu - xm = 9145,2776,46o-, = -8500,4365,90o-. 

These series show the approach to the exponential values and at the same time 
indicate the influence of the limited range on the higher intervals at the tail. 
While for the exponential curve the interval correlations are all zero, we see them 
approaching zero for the 11th power curve (p. 271). To compare with curves beyond 
the exponential the correlations on p. 275 must be read in reverse order, »*«„-«», 
being read as r*,^, 


Illustration X. Type XI. 


Curve (x): y = yt l - J , with range from x — a to oo, 


<r = 0587,3268,22a* & = 5-592,1053, & = 12-347,3684, 


and y 0 = 19X/a. 

The curve is shown on the same scale as the previous curves in Fig. X. 


* a=17-026,2954a is the value to be need in passing from the r-funotion expressions with a to those 
in (T. 


Biometrika xxiv 


18 






274 


Rank- Variates and Rank-Intervals 



We give first the rank-variates measured no longer from the origin of x, but 
reduced to the mode, and given in terms of o-, not a. 

Si' - 0818,5718,94a-, - 0822,5171,24<r, 

x t ’ = -1723,7651,49«r, a* - *1229,5625,720-, 

x,' = -2735,4517,36o-, o-,,- = 1601,6667,75o-, 

xi = -3881,1366,450-, = 1982,3306,93<r, 

H' = -5200,4099,52<r, o-,,- = -2399,3092,14<r, 

x t ' * -6753,1880,81o-, <r„. = -2883,8392,43o-, 

a,' - -8636,3334,75<r, «■„. * 3480,6540,02<r, 




Karl Pearson 


276 


2fc # -1*1021,6573,24*, » -4276,2048,790*, 

* 1*4268,8825,160-, <r^ = -6470,8444,81<r, 

#n/ * l-9245,9591,83o-, <7*^ « -7597,3959,28o*, 

= 2‘9774,2321,29o-, o*^ * 1*3725,7470,15*. 

The tendency when compared with the exponential curve is to draw the ranked 
individuals slightly towards mediocrity, and to reduce the variability of the lowest 
ranks while increasing that of the two highest ranks. Both these tendencies will 
be found still further emphasised in the following curve, y «y 0 (®/^) 7 - 

We turn next to the rank-intervals and find* 


x t - < > 0905,1982,590-, 

<*81-81 “ 

•0910,0163,12o-, 

x 9 —x% — -1011,6865,830, 


•1017,7070,00o, 

x. — x 9 — *1145,6848,06o, 

< 7 8 4 -8* “ 

•1153,3956,09o-, 

x t -x t = 1319,2734,13o, 

< r 8n-8 4 ** 

1329,4770, lOo, 

x. — .«* = "1552,7731,32o, 

°8#-85 “ 

1566,8661,610-, 

x 7 -x t = -1883,1503,950, 


•1903,7912,02o-, 

Xg — x 7 — *2385,3238,33o, 

<** 85-87 = 

•2418,2458,76o, 

X. -x B = -3237,2252,02o, 


•3297,3592,920-, 

= -4987,0766,63o, 

<7 8lo-8# 

•5128,1004,50o-, 

Xn — Xio*** 1*0528,27 29,54o, 

^xn-xio ~~ 

l-1147,3011,17o-. 

These show the great increase of interval and of its variability in the last interval 


at the tail. The Galton Ratio is over three, and we obtain much the same value of 
division of prize money, 75 % and 25 %, as determined for the normal curve. 

The adjacent interval correlations are as follows: 

r sot-Ki, ob-cpi “ *0052,8264, * 810 - 8 #, “ *0263,4030, 

fW- 81 , 8 #-, 8 , = ‘0058,9360, =-0179,0025, 

*V-oa, 05-84 = ‘0066,5633, r a ^ tX% __ Xg «*0134,5797, 

**86-®4»8a-8# 5=5 *0077,7444, ^-oo.og-or * 0107,5295, 

** 01 - 05 , 87 - 8 # = *0089,5670. 

We have here a very low series of adjacent interval correlations, although we have 
departed, to judge by this curve's and specially its to a considerable distance 
from the exponential curve. 

Illustration XI. Type XI. 

Curve (xi): y = y* , with range from x = a to x * 00. 

Here <r « a x *2449,4897 +, & - 14*518,5185, & * 38*666,6667 
and y 0 “ 6 Nja. 

* In the ease of the intervals we need not distinguish between Xq +l - x q and x' qJrl - x' q . 
t Conversely a=4*0824,8$90<r. 


18—2 





276 Rank-Variates and Rank-Intervals 

This curve is graphed in Fig. XI. 



10 12 14 16 18 20 22 24 


x 

The following values were obtained for a sample of 11 from this curve as parent 
population: 

X! — 10153,8461,54a, <r Xl = 0156,2315,08a, 
x t = 1-0325,9452,41a, a x% » 0236,4093,03a, 
x z = 10520,7743,97a, a*. = 0312,1814,98a, 
x 4 =1-0744,6206,60a, <x Xi = 0392,4032,14a, 

5* = 1-1006,6845,79a, a Xs = -0483,5190,47a, 
x e =1-1321,1612,81a, a*. = -0593,5116,22a, 
x 7 =1 1711,5461,53a, a*, = -0735,3252,88a, 
x„ =1-2220,7438,12a, a*, =-0934,1902,87a, 
x 9 =1-2939,6110,95a, a„ = 1250,2667,20a, 
x w = 1-4115,9393,76a, <r Xu = 1880,5047,50a, 
x n = 1-6939,1272,52a, a, lt = -4154,5194,48a. 

Replacing a by a and measuring from the start of the curve, not from the origin, 
we find the following series: 

& = -0628,0742,940-, a w - = -0637,8124,60«r, 

x t ' = -1330,6658,73o-, a x{ = 0965,1369,37o-, 

x,' = -2126,0525,71a, a*,- = -1274,4756,27a, 

x t ' = -3039,9011,11a, a.,- = 1601,9794,11a, 

x 9 ' = -4109,7725,79a, a*,- = -1973,9582,41a, 

x 9 ' = -5393,6183,38a, a„- = -2423,0010,48a, 

x 7 ' = -6987,3579,02a, = -3001,9529,14a, 

*■' = -9066,1486,38a, a, v = -3813,8158,72a, 

x 9 = 1-2000,9120,28a, a„. = -5104,1925,05a, 

x 10 ' = 1-6803,2521,20a, a^= -7677,1284,85a, 

x u ' = 2-8328,8683,47a, a^< = 1-6960,7546,04a. 







Karl Pearson 277 

It will be seen at once how marked is becoming the influence of the tail in 
separating both variates and variabilities of the highest ranks. 

We have then the following values for the rank-intervals and their standard 
deviations*: 

3* -a** *0702,5915,80(7, - 0714,7689,04c*, 

** *0795,3866,94(7, = *0810,9551,65(7, 

^4 -S* - *0913,8485,45(7, c*^ - *0934,3079,12(7, 

x 6 -£ 4 « *1069,8714,68(7, * *1097,7184,92(7, 

3? 6 -3, * 1283,8457,58(7, - *1323,5426,59(7, 

a? —x 9 * * *1593,7395,64(7, = *1654,0541,75(7, 

x B -x 7 = *2078,7907,36(7, (r^ . *2179,4132,88(7, 

** -fib « *2934,7633,90(7, * *3129,9186,31(7, 

£io~£ 9 = ‘4802,3400,92(7, <7^.^ — *5305,5302,65(7, 
1*1525,6162,23(7, (7^ 10 =1*4323,1831,01(7. 

Here the Galton Ratio is almost 3*4, and the high values of the highest interval 
and its variability are conspicuous. 

Finally, the correlations, all plus, are as follows: 

**-*,*-*- *0168,5446, rseio-at.xn-Bto - *0803,1649, 
r aa-aJ*,a54-flJa ~ '0189,6107, ^-*8, xiQ-ty = ’0583,1446, 
r «4-aj*,ajft-flS4 = *0215,8128, r aj g -a>7, =*0443,3855. 

r at-Xi,ok-Xi ~ *0249,2747, =*0354,3929, 

^,^-•0293,4601. 

These are all small and confirm our previous conclusion that we can move a long 
way from the (fa, fa) of the exponential curve and yet find that the correlations 
between adjacent rank-intervals, and therefore a fortiori between non-adjacent 
rank-intervals have not reached any very sensible values. 

Summary . 

It may be asked: Why in this paper have so much labour and space been 
devoted to the numerical illustration of a sample with definite size from a particular 
category of frequency curves ? The answer is threefold. 

(i) This system of curves has a particularly practical value. It is not only that 
the Exponential Curve as one of the series is very important as the curve describing 
the random distribution of the occurrence of events in time or space. That curve 
is deduced by the conception of a happening at a point of time or a point of space. 
But very often an occurrence cannot so happen. It happens within a certain unit 
of time, or a certain unit of space. It is a purely mathematical conception to find 
a "thing” at a point in space, or to mark a "happening” occurring at an instant 
of time I You may try to make a record of this kind and you will find it impossible; 

*V 


In the oase of the intemln we need not distinguish between «*+! - x q and x‘ 





278 


Rank - Variates and Rank-Intervals 


you can only measure your space or time in definite units, however small, and more 
generally the happening did or did not occur within a quite appreciable measure 
of space or time. In such cases we cannot proceed as Whitworth has done by 
taking our interval indefinitely small, and thus reaching the exponential limit. 
The actual frequency curve then falls into the biquadratic series of curves, but is 
not the exponential curve. 

(ii) It is the writer’s opinion, and this is based on a fairly long statistical 
experience, that mathematical formulae convey little to the mind, until they have 
been numerically illustrated, and this illustration brings out new points, which the 
formulae actually contain, but which might have been passed over, but for their 
numerical illustration. Certain broad principles of this kind arise from our work. 
For example: 

(а) The correlation between adjacent rank-variates is high, but the correlation 
between adjacent (and a fortiori non-adjacent) rank-intervals is small, and for 
many purposes negligible. 

(б) The partial correlation of any two rank-variates, or any two rank-intervals 
for a constant rank-variate or a constant rank-interval lying between them is zero. 

(c) The order of the variabilities of rank-intervals as measured by their standard 
deviations is much the same as the order of the intervals themselves. There is 
equality in the case of the exponential curve, and this property extends approxi¬ 
mately for a considerable range on either side of it. 

(d) Oalton’s Ratio, namely 2 to 1, for the ratio of the first rank-interval to the 
second in the case of the end of the curve with lesser frequency is approximately 
true for a large number of curves. 

(e) In samples from a curve of finite range the correlations of adjacent interrank 
intervals are negative; in samples of one with unlimited range they are positive. 

(/) In cases where there is much predominance of mediocrity, the interval 
between the first and second ranks may be ten or more times the interval between 
mediocre individuals. This is but a special illustration of the great principle 
(which ought to be generally recognised, but frequently is not) that differences in 
physical or mental ability between specially able individuals will invariably be 
found to be much greater than those between ordinary individuals. Several 
characteristics of the so-called “genius” are involved in this principle. 

(iii) The statistical characters of the curves dealt with have a very wide range. 
Thus 

Curve (i): &«* 2*7500, /8,« 47143, 

Curve (ii): 1*4286, 2*1428, 

Curve (iii): & = 00000, 1*8000, 

Curve (iv): 0*0067, 1*8316, 

Curve (v): 0*0494, j3 1*9281, 



Karl Pearson 


279 


Curve (vi): /8i= 0*3200, /S* = 2*4000, 

Curve (vii): /8i*= 1*6461, 8t = 4*6222, 

Curve (viii): jSi= 2*6096, /9*= 6*1542, 

Curve (ix): /Si- 4*0000, *8,= 9*0000, 

Curve (x): = 5*6921, A- 12*3474, 

Curve (xi): A-14*6185, ft-38*6667. 

Many of these curves are of a most skew character, and they are very widely 
spread over the /3j, /9» plane. It is not suggested that the ranking characters of aK 
frequency curves can be judged by these illustrations, but it is certain that the 
general properties we jiave stated for these curves are likely to hold for curves 
with their $i> 8a not very far from the biquadratic, and that some of these pro¬ 
perties may possibly hold for samples from all continuous frequency curves. 

Lastly, I venture to think that a more intensive study, analytical and experi¬ 
mental, of the individuals in a sample, rather than of the statistical constants of 
the sample as a whole may suggest results of considerable practical value. 



MISCELLANEA. 


(i) A Comparison of the Accuracy of Two Types 
of Quadrature Formulae. 

By E. S. MARTIN. 


The object of this paj>er is to examine and compare the accuracy of two types of approximate 
quadrature formulae as applied to commonly occurriug frequency curves* 

Let the area between the curve whose equation is y **/(#), and the axis of x be divided into 
strips by a series of ordinates distant h apart, and the mid-ordinates of these strips calculated; 
then it is required to calculate the area of each strip. 


Let x r be the abscissa corresponding to the mid-ordinate y r of any strip; by fitting a “para¬ 


bola ” of the form 


y » a + bx + cx* + do? + ex 4 


to the five consecutive mid-ordinates whose middle one is y r , the following formula is obtained 
for the area of the strip with mid-ordinate y r : 



h 

5760 


[5178y r + 308 (y r . l 


+yr + l)-17 (yr- 2 +yr + g)] 


(i). 


If the curve has a finite terminal, in order to calculate the areas of the two terminal strips, 
we may either (a) assign zero values to one or two of the ordinates in the above formula, or 
(i b ) fit a “parabola” to the five terminal mid-ordinates. If ( b) is adopted, the following formulae 
are obtained for the first two strips, with equivalent formulae for the last two strips: 


[6463ft - 2092ft+2298#, -1132ft+223ft] 

«s “ [223ft+5348ft+138ft+68y 4 - 17ft] 

The second type of formula is obtained on the assumption that f(x) can be expanded by 
Taylor’s Theorem, and has the general form 

n r » A Jy* ( x r ) +2^ / (#r)J .....(ii), 

neglecting terms in A r> , etc. 

This, of course, requires to be put in a form suitable for calculation for each curve considered. 
The equations of the curves here tabulated, with the corresponding forms of (ii), are: 

Normal Curve 'L e~* x * (Tables I a ~ r )> 

v 2 tt 

Curve of type y = B — *™,(l-*)», (Tables II--'), 

* = ;„J, a. (”»i+ m 2 x-mtf (w t 

' ^|_ *4 (ft +ft) x 2 (1 - x)* J* 



Biomtrika, Vol. in. pp. 310—312, and Tables for Statisticians and Biometricians, Part II. p. xvi. 






Miscellanea 


281 


which for calculation purposes is put in the form 

n r =Ay[l+A* ( ^ ( ^=^], 

where a, 0 and y depend on the constants m l and of the curve. 

Curve of type y= 0e~* (Tables III*-«, IV—VII), 



The last two of these curves are respectively Pearson’s Type I and Type III frequency curves, 
in simplest form, with total frequency unity and origin at one finite terminal 

In the tables appended there are set out the mid-ordinate y of each strip, and its corresponding 
abscissa x\ the true a*$a rf of each strip; the value n (i) obtained from formula (i); the value 
n (i i) obtained from formula (ii); and the errors in the last two results. Where the curve has 
a finite terminal and the two terminal strips are appreciable, there are set out for comparison n, 
n (1) obtained by putting one or two ordinates zero in (i), n w obtained from (i)', and n<u). 

In the tables of the Normal Curve the true values n are taken from Sheppard’s tables of this 
curve; in those of the Type I curve they are taken from unpublished tables of the Incomplete 
B-Function in the Department of Applied Statistics, University College, London; while for the 
Type III curves the true values are obtained from the published tables of the Incomplete 
T-Function. 

An examination of the tables shows that, disregarding terminal strips, there is very little to 
choose, as regards accuracy, between the formulae (i) and (ii). For an interval A equal to one-half 
the standard deviation cr of the curve, both formulae give, in about 90 % of results, three signifi¬ 
cant figures correctly, with only small errors in the third figure in the remaining results; in 
about 70 °/ 0 of results, four significant figures are correct. For A*»J<r or smaller, both formulae 
can be reasonably relied on to give everywhere four or more significant figures correctly. 

With regard to the terminal strips, it is seen that formulae (i)', specially devised for these 
strips, give by far the worst results (except in Table III 0 ); ludicrously bad results are obtained 
for the actual end strip. Formula (ii) is obviously the more accurate for the terminal strips, in 
the curves examined. The areas under consideration, however, are small compared with the total 
area; thus for h=* £tr, the most unfavourable case considered is in Table III®, where the two 
terminal strips amount together to about 14 °/ 0 of the total area. The total error involved in 
using formula (i) for these two areas is less than 0*2 °/« of the total area, i.e., less than 1 in a total 
frequency of 600. For smaller values of A the error is of course much smaller. 

Thus it would appear from these tables that formula (i), being as accurate as, and much 
easier of application than, formula (ii), is the better formula from which to calculate sub¬ 
frequencies in the types of distributions considered; it is also accurate enough for most purposes 
in calculating terminal frequencies. 

It is hoped that the tables will indicate the degree of accuracy to be expected for various 
values of A in the more usual frequency distributions. 



282 . 


Miscellanea 


h — ^<t = 


TABLE I“. 


Normal Curve: 


1 


V V2 7T 


eri*. 



Error x 10* 

X 

y 

. 

n 

*<h 

W(ii) 

in n ( |) 

in n<ii) 

0 

*308 942 

•197 413 

*197 428 

•197 393 

±15 

— 20 

± *6 

*352 065 

*174 666 

•174 671 

•174 657 

+ 5 

- 9 

±1*0 

*241 971 

*120 978 

•120 968 

•120 985 

-10 

+ 7 

±1-6 

*129 518 

•065 591 

•066 583 

*065 602 

- 8 

±11 

±2*0 

*053 991 

•027 835 

•027 836 

•027 839 

± 1 

± 4 

±2*5 

*017 528 

•009 245 

*009 248 

•009 243 

+ 3 

- 2 

±3*0 

*004 432 

•002 403 

*002 404 

•002 401 

± 1 

- 2 

±3*5 

*000 873 

*000 489 

*000 488 

*000 487 

- 1 

- 2 

±4*0 

*000 134 

*000 078 

•000 077 

*000 077 

- 1 

- 1 

±4*5 

•000016 

•000 010 

•000 010 

•000 010 

0 

0 

±5*0 

*000001 

•000 001 

*000 001 

•000 001 

0 

0 


h — \a = \. 


TABLE P. 



Error x 10 6 

X 

y 

n 


n (ii) 

in n (i > 

in n<ii) 

0 

■398 942 

•132 368 

•132 369 

•132 365 

±1 

-3 

± -a 

•377 384 

•125 279 

•125 279 

*125 277 

0 

-2 

± 

•319 448 

•106 209 

•106 209 

•106 209 

0 

0 

±1-0 

•241 971 

•080 656 

•080 655 

•080 657 

-1 

+i 

±i-a 

•164 010 

•054 865 

054 865 

•064 867 

0 

±2 

±1-6 

•099 477 

•033 431 

•033 430 

•033 432 

-1 

±1 

±2*0 

•053 991 

*018 246 

•018 246 

*018 247 

0 

±1 

±2-4 

■026 222 

•008 920 

•008 921 

•008 920 

±1 

0 

±2-6 

•oil 396 

•003 906 

•003 907 

*003 906 

±1 

0 

±30 

*004 432 

•001 532 

•001 532 

•001 532 

0 

0 

±3-3 

•001 542 

•000 538 

*000 538 

•000 538 

.0 

0 

±3-6 

*000 480 

•000 169 

•000 169 

•000 169 

0 

0 

±4-0 

•000 134 

*000 048 

•000048 

•000048 

0 

0 

±4-3 

•000033 

•000 012 

•000 012 

•000 012 

0 

0 

±4'6 

*000007(5) 

•000003 

•000 003 

•000 003 

0 

0 

±5’0 

•000001 

•000 001 

•000 001 

•000001 

0 

0 




TABLE I«. 


This table has been calculated and it is found that n (j> and are everywhere correct to 
six decimal places. 












h — ‘02 = Jo- approx. 


TABLE IP. 






















































o» o» 


Miscellanea 

TABLE 11°. 

Equation of Curve: ^ ^00000 069 694 06 
01 * l<r approx. 


0^(1 — a ) 


«(1) 

»(ii> 



•071 269 
*084 866 
•096 859 
•105 109 
*107 267 
*101 343 
*086 670 
*064 396 
•039 078 
•017 025 | 

•017 269 
•084 867 
•096 859 
•106 109 
•107 267 
•101 342 
•086 568 
•064 394 
•039 077 
•017 029 


Error x 10* 


in n {i) in »<«> 


Error x 10” 



002 568 
005 313 
010 696 
020 826 
038 889 
068 814 
113 323 
168 718 
216 785 
213 763 


*002 624 
*006 420 
*010 890 
*021 164 
*039 386 
•069 438(5) 
*113 822 
•168 444 
•213 762 
•209 892 


■002 624 
*005 420 
•010 891 
•021 154 
*039 387 
*069 440 
•113 822 
•168 434 
•213 729 
•209 879 


•002 624 
•005 420 
•010 890 
*021 153 
•039 386 
•069 441 
•113 830 
•168 467 
•213 767 
•209 844 




n (i) 

«<i Y 

•124 298 
•017 440 

•124 193 
*018 000 



n<H) in n<u in n w 


•123 767 
*019 481 

































































































TABLE V. 

Equation of Curve: y = —L a?*e~ 

A = 2a ^(T. 



OfcO 




















































MUadlanea 


287 


h — 25 = i<r. 


TABLE VI. 


Equation of Curve: y 


1 

24! 





Error x 10® 

a? 

y 

n 

n (i) 

*<iu 

m« (1) 

in 

6*26 

•O&XJOOO 

•000000(4) 

-•000 001 

•000000(3) 

- 1 

0 

8*76 

•000 0104 

•000046(5) 

•000044 

•000 044 

- 2 

- 2 

11-25 

•000 3928 

•001 145(5) 

•001 157 

•001 138 

+ 12 

- 7 

13-75 

•003 5893 

•009 972 

•009 979 

•009 975 

+ 7 

+ 3 

16-25 

•016 2364 

•042 011(5) 

•041 988 

•042 035 

-23 

+ 24 

18-75 

•041 3336 

•103 596 

•103 585 

•103 607 

-11 

+n 

21-25 

•068 4148 

•169 434 

•169 453 

•169 416 

+ 19 

-18 

23-75 

•081 0450 

•200 395 

•200 409 

•200 373 

+ 14 

-22 

26-25 

•073 4796 

•182 388 

•182 384 

•182 384 

- 4 

- 4 

28-75 

•053 5346 

•133 768 

•133 759 

•133 776 

- 9 

+ 8 

31-25 

•032 5081 

•081 881 

•081 878 

•081 889 

- 3 

+ 8 

33-75 

•016 9210 

•042 986(5) 

•042 987 

•042 990 

+ 1 

+ 4 

36-25 

•007 7181 

•019 777 

•019 779 

•019 777 

+ 2 

0 

38-75 

•003 1398 

•008 114 

•008 115 

•008 113 

+ 1 

- 1 


TABLE VII. 


A = 35 = *<r. 


Equation of Curve: y = 



Error x 10® 

X 

* 

n 

*( i) 

W(ii) 

in n (1) 

in nfli) 

22-75 

*000 0014 

•000 008 

•000 007 

*000 008 

- 1 

0 

26-25 

•0000421 

•000 196 

•000 196 

•000194 

0 

- 2 

29-75 

•000 5167 

•002111 

•002 118 

•002 106 

+ 7 

- 6 

33-25 

*003 2493 

*012 261 

•012 266 

•012 263 

+ 5 

+ 2 

36-75 

•Oil 9707 

•043125 

•043110 

•043 141 

-15 

+ 16 

40-25 

•028 4774 

•100 038 

•100 024 

•100 050 

-14 

+ 12 

43-76 

•047 0604 

•163 408 

•163 419 

•163 396 

+ 11 

-12 

47-25 

•057 1429 

*197 853 

•197 869 

•197 831 

+ 16 

-22 

50*75 

*053 2811 

*184 998 

•184 998 

•184 989 

0 

- 9 

54-25 

•039 5185 

•138 094(5) 

•138 085 

•138 100 

- 9 

+ 6 

57*75 

023 9927 

•084 570 

•084 565 

•084 579 

- 5 

+ 9 

61-25 

•012 2080 

•043 465 

• *043466 

•043 470 

+ 1 

+ 5 

64*75 

*005 3092 

*019 108 

•019 110 

•019 108 

+ 2 

0 

66-25 

•002 0064 

•007 302 

•007 303 

•007 301 

+ 1 

- 1 






288 


Miscellanea 


(ii) A Simple Non-Normal Correlation Surface. 

By H. L. RIETZ (University of Iowa). 

Let x u x 2y x z be an observed set of variates each taken at random from a continuous rect¬ 
angular distribution from 0 to 1. Form the sums 

X"*Xi+X 2) 

y**x x +x z 

with X\ in common. 

Assume that pairs ( x , y) arc thus formed, and plotted with a rectangular coordinate system. 
We set the problem to determine the form of the correlation surface 

* =/(*’, y\ 

where zdxdy gives, to within infinitesimals of higher order, the probability that, in a trial, a point 
(x y y) will fall into the rectangular element d.vdy. 

It is known* that the theoretical distribution function, say f\(x\ of marginal totals of the 
^-arrays of y y s would be the equal sides of an isosceles triangle whose equations may be written 


fi(x)*=x for values of x from 0 to 1.*.(1), 

and f x (^)b=2 - x for values of x from 1 to 2 .(2). 


We now propose the question: What is the geometrical form of the theoretical array of ys 
that corresponds to any assigned x in dxl In answering this question, we consider two cases. 



# H. L. Rietz, “ Proceedings International Mathematical Congress,” Vol. u. (1924) pp. 796—7; 
J. 0. Irwin, BiometHU, Vol. m. (1927) p. 287; Philip Hall, Biometrika, Vol. m. (1927) p. 240. 




Miscellanea 


289 


Case I. When x takee a value of in the interval 0<o 1. 

It will be helpful iu the exposition which follows to give attention to a square of side 2 (Fig. 1) 
which is the held for the scatter diagram of points (#, y). 

It is obvious that no points could fall into the corner of the square cut off by the line Pi Pa 
whose equation is y=*x+ 1; for, x^ t x , ^DE ) and # 3 < 1 = BD. 

For values of y ranging from 0 to x\ that is, for points along BC of the x'-array of y% the 
frequency of values of Xi+Xa in an assigned dy is the same as the frequency in dy of the sum of 
two variates each drawn independently from a rectangular distribution from 0 to 1. Hence, the 
distribution curve for that part of the array of y* s from B to C may be represented in the 
ys-plane vertical to the ^ 2 -plane of Figure 1 by the equations 0 


For values of y ranging from at to 1 , that is, for points along CD , the distribution of y* s is 
rectangular; for, no matter what value x has in the interval 0 to the chance that 4 ?i+# 8 will 
fall into a given dy is the same wherever dy is taken on CD. The equation for that part of the 
theoretical array of /s from C to D may then be written 

\ .(4). 

Zssac where c is constant J 

For values of y ranging from 1 to l+a?', that is, for points along DE ’ the distribution would 
again be the same as that of the sum of two independent values each taken from the interval 
0 to 1. Thus, the distribution curve for the part of the array of y*s from D to E is given by 

. 

Since the marginal total for the section x = af is af y we may determine c in (4) by equating the 
area of the total section to a!. Thus 

kaf*+c(l -aO-f ix' 2 =x\ 

This gives c*=*af. 

The theoretical array of y’s is an isosceles trapezoid. 


Case II. When x takes a value si in the interval 1 to 2. 

When x takes a value x > in the interval 1 to 2, it follows at once that x x is not less than x'—l. 
Hence, it is clearly impossible for the point (or, y) to fall into the corner of the square cut off by 
the line P 3 P 4 whose equation is y «=#-!. 

For values of y ranging from a! -1 to 1, that is, for points along FG, the distribution curve for 
the a-array of y’s is given by 

74 -^} .<«>• 

For values of y ranging from 1 to x\ that is, for points along OR, the distribution is rect¬ 
angular and is given by 

l. (7 ). 

z *=>k where k is a constant J 

Since the marginal total for x**x' is 2 - 4 /, we determine k in (7) by equating the area of the 
section to 2 - af. Thus 

±@-a/)*+k(a/-l)+i(2-x')**=2-x'. 

This gives ifc«2 -af. 


loc . cit. p. 797. 








290 


Miscellanea 


For values of y ranging from to 2, that is, for points along HI, the distribution curve in 
the tf'-array of y’s is given by 

l.(8). 

z = 2-yf 

It is now fairly obvious that the typical .r-array of y’s is an isosceles trapezoid which de¬ 
generates into straight lines when #'«■() and when #'=2, and becomes an isosceles triangle when- 
a'-l. 

From equations (3) to (8) we have the answer to the question about the probability that a 
y corresponding to an assigned x will fall into an assigned dy. 

From simple geometrical considerations it follows that sections y—y' of our surfaoe «*■/(#, y) 
are similar to sections x=x*. The theoretical distribution of marginal totals of tho y-arrays of 
#’s is given by .. ^ ^ - 

/a(y)“ y lorO^y^l, 

/a(y)«-y + 2 for 

The line of regression of y’is on x would be 

y-l = i(*-l) .(9), 

since from symmetry the centroids of sections x^rf lie on the line PP whose slope is 

Likewise, the regression line of x’s on y is 

To summarise, we may describe the correlation surface by saying that for any assigned x in 
the interval 0 to 1, the distribution function of the array of y’s is given by 

z=y whenO^y^#, 

z—x when#^y^I, 

2 = —y+« +1 when l^y<l +x, 

and for any assigned x in the interval 1 to 2, the distribution function of the array of y’s is 
given by smey—x+l when #-l^y^l, 

z=2-x whenl<y<^‘, 

2 = 2—y when #<y^2. 

By making «r=*=l in z—x or in 2 ** 2 — .r, we get 2 = 1 , the largest value of 2 . It now becomes 
fairly obvious that our correlation surface is simply the hexagonal pyramid of unit altitude on 
the base OP x P 2 P Q P i P s with the foot of the altitude at the centre of the base. 

For urn schemata with situations analogous to those of the present problem but concerned 
with discrete rather than with continuous variables, we may cite a paper* that considers the 
correlation of the sums of first and second throws with two dice. 


(iii) Professor Rlets’s Problem. 

EDITORIAL. 

This problem is more or less familiar in a more generalised form, and has been used not in¬ 
frequently to explain “correlation by a common factor.” 

Let yj-tfj+tfs, and y 8 =a?«- +# 4 . 


* H. L. Rietz, Annals of Mathematics , Vol. xxu. (1919—21) pp. 506—822. 





Muoellcmea 


261 


Let 0 g be the standard deviation of x 9 and 2, of y 9 ; then if r* be the correlation coefficient of 
x 9 and x t and R oiy u we have at once from the definitions; 


S 1 8 «cri 8 +o'8 a +20-10-3^3 .....(i), 

Sj*s 3 0- a *-f o-^ + So-jo-ir^ ...(ii), 


a 0*10‘s Ty% + O-j 0* 4^14 4- CTa 08^28 4- <T80 , 4 r 84 

““ 2^2 . 

Still more general equations will be found in the Phil. Trans. VoL 192 A , pp. 260*—261. 


Professor Rietz takes his x x and x % identical and supposes x u x 3y x { drawn from the same 
frequency distribution. Hence 

< T \ s* ( T % = 03 * 0*4 


and 


« l + r 14+ r 28+ r 84 

2VT+^Vl+r^ 


Supposing # 1} # 8 , x 4 to be independent drawings, he has all his coefficients of correlation zero, 
and finds /£■»£. 


The advantages of the more general form are that R can take any numerical value, x x , x 2 , x 3y x 4 
can have any degree of interdependence, and they may be drawn from quite different frequency 
distributions. 


If we suppose x % identical with x 1 and x 3 drawn from the same frequency distribution as x 4 , 
we have, supposing no correlation, 2 

p_ 

<r?W’ 

which can by properly choosing the ratio of the variances take any desired value. For example, 
if we wish to obtain experimental data with a correlation 

m > n, 

we toss m coins and count the heads for our first variate, we leave n on the table (a bed is better!) 
and toss again the m-w picked up, the heads on the total m now on the table give our second 
variate, and the process being often repeated the correlation of the two variates will have the 
requisite value njm. The experiment is of value in teaching a student how a “common factor” can 
be made to give any arbitrary correlation. A similar experiment with dice was habitually used 
by the late Professor Weldon to introduce his students to the subject of correlation. 

If we suppose our pairs are drawn from a fourfold normal distribution, which in some 
respects is of more practical value than a rectangular one, the surface will be of the form 
z-z 0 expt. - £ (quadratic function of x x , x 3 , x 4 ), [dx x dx 2 dx 3 dx 4 ]. 

Now replace x x by y x -x 3 and x 2 by y%~x 4) and we have 

expt. - £ (quadratic function of y X) y a , # 3 , # 4 ), [dy x dy 2 dx 3 dx^\. 

Integrate out x 3 and x 4 between the limits ± oo, and we have 

z—Zq expt. (quadratic function of y Xf [dy x dy^\. 

We need not, however, perform these operations in order to determine what the coefficients of 
y x *, y x y 2 and y« 2 are*. They will be given by 

v- mu* + t?\ 

_ *1 -**U* JfcZf + v/, [%,rfy s ] .(iv), 

2frS 1 2 a vl - R& 

where 2 X , 2 S and R are provided by Equations (i)—(iii) above. 

Whatever the distribution may be, if x x (=*a? a ), x s and x 4 are drawn from that distribution 
only, R— J. 

* I onoe did so; it forms an interesting exeroise in determinantal analysis, and of course concludes 
with (iv). 







292 


Miscellanea 


(It) Note on a Memoir by A. B. R. Church. 

Biometrika^ Vol, xvm. 

In a memoir on the “Means and Squared Standard Deviations of Small Samples” of 1920 
1 gave, p. 382, a formula for the fourth moment coefficient of the variance of samples of N from 
a limited population of size M following any law of frequency. A printers error was discovered, 
namely the number 217 was read for 271, in the ninth line of p. 382, and this was published in 
Biometrika , Vol, xix, list of Errata at the beginning. Lately, in a memoir published by Dr L. 
Isserlis in the R.S. Proc . A, Vol. 132, 1931, further errors are referred to, namely p. 382, line 5, 
-18.V+77 is read for -42A+133; line 6, +2880 for -2880, and line 7, +21840 for -6040. 
Having verified these corrections I am very grateful to Dr Isserlis for drawing my attention to 
them and hope those possessing copies of Biometrika will note the changes mentioned. 

As I stated in my memoir, the practical use of a formula covering three and a half pages of 
Biometrika is very small; in fact the main object of its publication was to indicate that the 
moment coefficients of moment coefficients of samples from finite populations have reached, with 
the variance, unmanageable dimensions, and that general discussions on such moment coefficients, 
though they may be excellent oxamples of algebraic manipulation, are of small statistical value, 
for the results are quite unusable in practice. 



Volume XXIV 


NOVEMBER, 1932 


Parts III and IV 


BIOMETRIKA 


FURTHER APPLICATIONS IN STATISTICS OF THE 
T m (x) BESSEL FUNCTION. 

Bt KARL PEARSON, S. A. STOUFFER and F. N. DAVID*. 


(1) The T m (tv) fair tion was defined in a paper by Pearson, Jeffery and Eldeirtonf 
to be given by 

. 

where K m (x) is the Bessel Function of the second order and imaginary argument. 
Here T m (x)~T m (—x), while x on the right is always to be given its numerical 
value. Remembering this, we need not write \x\ m K m {\x\) in the equation. 

If y-MT m (x) (ii) 

be treated as a frequency curve, it will be symmetrical and run from — oo to + oo of a?. 
The constant in (i) has been so chosen that 

[ ydx**21d\ T m (x)dx = M. 

J -00 Jo 

An integral form of K m {x) is given byj 

.«• 

Hence we may write (ii) in the form 

.<">• 


(2) Consider in the next place the curve 

y=y*e "(J+-) . 

the origin being the mode at distance a from the start of the curve. 
It follows easily that 


.(v), 


where M is the total frequency. 


M p** 1 e~ p 

y °“ a r(p + l) 


•(vi), 


* The suggestion of the problem and the selection of the illustrative examples were provided by 
8. A. Stoufler, the solution through the T m (x) function was given by K. Pearson, who is also responsible 
for the text. Florenoe N. David computed the table of the probability integral of the T m (x) distribution, 
t Biometrika^ Vol. xxx. p. 184. 

X G. N. Watson: A Treatise on the Theory of the Bessel Funetions t p. 172, Equation (4). 

Biometrika xxrv 19 









294 Further Applications in Statistics of T n (x) Bessel Function 


Thus the curve can be written 


V 


,-'H) 


H i+ j)F 


a Txp + 1) |"V ' «/) . (,B) - 

Write z~p ^1 and the moments about the start of the curve can be found at 
once. These lead bo* 

Mean = w f = a (p + l)fp 

Standard Deviation® a = a Vp+l/p I ( ...v 

a 6 * .\ vlll /» 


ft-- 


^1« 3 + ; 


P+l' " >+l 

providing the well-known relation, 2$ t — 3/9i — 6 = 0. 


(3) Now suppose there are two independent variates u and v both of which have 
frequency distributions provided by Equation (vii). We assume the two distributions 
to have the same p , but to have different standard deviations a\ and o*, or, what 
amounts to the same thing, different modal distances a and 6. We will measure our 
variates u and v from the start of their curves, which then take the form 


and 


y*= M - a e ~ a ©T/ r <p+u 

(f)7r (p +1). 


If we take w = M ~ x we obtain the combined frequency surface 




a b r*(jp+l) 


r^e 




.(ix). 


( tt v\ fv u\ 

~ 4 -A and F= p (g > then the element for integration of the 

above surface is dudv , or if we take it d d (^~) we may replace it by dX dY, 
and we have for integration 

.(«). 

We have to integrate this out for X to get the distribution curve of Y. In the 
upper octant XOB (Fig. 1, p. 295) the limit for X is clearly X m Y to X = oo along 
the shaded area. Or, the curve of distribution of Y is 

.■<**> 

Put X **Yt and we have 

: 1*571)* Y ‘’ n )*«-"(<■-»)'*.(”)• 


Phil, Tran», t Yol. 185 A , p. 873. 









295 


Karl Pearson, 8. A. Stouffer and F. N. David 

If we take the lower octant XOA, the limits of X are — F to oo, but as F is now 
negative we get precisely the same result, or we say that the whole curve of distri¬ 
bution of F is (xi), F being taken as positive, and from 0 to go, and mirrored in the 

axis of X. This result also flows from the fact that the distribution of ? — W must be 

o a 

a symmetrical curve, as the frequency curves for u/a and v/b are identical. 

Now if in (iv) we write x = F, m —p + J , we see that the z of (xi) is given by 

z = MT p ^(Y) .(xii), 

which leads to JJf for the area of our half curve. In other words our curve for F is 
the curve mirroiod on itself. The ordinates of this curve have been computed 
by Dr E. M. Elderton *. 



(4) Now the odd moments of the mirrored curve vanish. Let us find the even 
moment-coefficients. We have from (x) 

where the limits of X and Y are to be chosen so as to cover the upper octant BOX. 
Now if we integrate first with regard to F, the limits will be from 0 to X, and then 
with regard to X from 0 to oo . Thus 

«■- W(FiT) fr-*l*y(f-rydTdx .0riii> 

Put Y am X\ and we have 

t4, + l > j{>*•<> - 

* See Biometrika , VoL xxi, pp. 194—201, or Tables for Statisticians and Biometricians ,* Part II. 
pp. lxxix—lxxxviii and 188—144. 


19—2 





296 Further Applications in Statistics of T m (x) Bessel Function 


or, if 



^ m P^TT) r(2s+2p + 2) lo /c ‘ i(1 K)vdK 



1 rv" I M» D i‘M r(8+ * )r( * +1) 

2^r*(j9+i) r(2 + p ' r(«+p+f) • 


Ifs-0, 

1 r( 2 p + 2 )T(J) 

/<0 ~2^r(p+i) r(p+i) 


Hence 

r(2s + 2p+2) r(p+t) r(«+j) 

At *“ r( 2 iJ + 2 ) r{s+ P + 1 ) rm . 

.(xiv), 

and 

_(2p + 8)(2p + 2)l ? 9 

p + | 2 = 2p + 2 . 

.(xv). 


Generally p^ = (2$ -1) (2p + 2s) p*-* .( xv W*)» 

q __ Pig _ (2.9 — 1) (2p 4- 2#) pit-i 

2 P +2 (^r 1 ’ 

A»~* = (2s -1) ^1 + gT p' j T ^ ) &»- 4 .( xy i)* 

Thus finally 


/9*-,«(2»-l)(2*-3)... 1 (l + i-J) (l +i^) ... (l + ^) ■(”«)• 
It will be clear that when p -♦* x we obtain 


&,_ a = (2s-l)(2*~3)...l, 

the familiar ySgg-a formula for the normal curve, into which the function then 
passes. 


Consider the Type VII curve 




(a a + ' 

Here we have 
and pi — a 8 /(n — 3). 


Now it is clear that we can make p a and p 4 agree in the Type VII and the T p fj. 
curves*, but farther than that we cannot go, although the may not differ widely 
if n be considerable. The Tp+^ curve has the further advantage that no moment- 
coefficients tend to become infinite, while if n be an odd integer, those for the Type VII 
curve may become so. For values of p not too great the Type VII will fit the dis¬ 
tribution of Y considerably better than the normal curve. For considerable values 
of p, both Type VII and the T p +j curves pass into the normal curve. 


(5) A few further points may be noted. If p « — £ the IVcurve asymptotes to 
the vertical at the origin, and this holds as long as p lies between — J and 0; if 


* We must take 


p +1 »-5 


or n«2/> + 7, and a = 2 V(jH*l)(p+2). 







Karl Pearson, 8 . A. Stouffer and F. N. David 


297 


jp* 0, the -curve starts with a finite ordinate and makes a finite angle with the 
vertical, it is the exponential curve. If p be positive we see from (xhis) that 
dzjdY** 0 for F*0, or the double mirror curves have a common tangent at the 
axis of symmetry and will in appearance form a single curve. If p be a positive 
integer it is possible to expand z in powers of F, but the series does not present 
any great advantages to the computer. 

When pa* 11, Dr Elderton’s Tables terminate, but it is shown in the memoir by 
Pearson, Jeffery and Elderton* that when p = 11, the two curves 


and 


z 


z = MT p+i (Y) 

m *>' ru(Sp^7)i _ 1 _ 

V2ir(i>+l)(*>+2) T(p + 3) / F* \ I (»!>+’) 

V + 4~<>+l)(j»+2)J 


.(xviii) 


coincide for practical statistical purposes. The areas of this latter curve up to given 
values of Y have been tabled f from p «■ - £ to p «12, but this hardly carries us 
beyond the T m -tables. The completed (and now at press) Tables of the Incomplete 
H function carry us up to ip + 7 ■* 101, or p = 47. 


(6) Now let us turn to the means of samples of size n drawn from the Type III 
curve 

- ?? / x \p 

y=yo' e .(xix), 

where the origin is at the start of the curve and a is the distance to the mode from 
the start. Let us suppose a sample x Xt x 2t x *... x n drawn and let its mean be 
x n m (&i + #2 +... + x n )/n- Then the chance P of a sample lying between x x and x x + Sx x , 
x 2 and ... oc n and # n 4* &r n is given by 

P«const.xe -foi+ g i+ — + aB n) dx x dx 2 ... dx n . 

Now get rid of x x by introducing x n as a variable and write 1% for 


We have 


P as const, x e 


I X 2 •“ X\ ~~ ... ■” X n . 




Put x t = l 2 xt and integrate out for # a = 0 to l 2 or #/=®0 to 1. This will introduce 
a B-function into the constant, but leave us with 


* Of. Biometrika , Vol. zzx. pp. 171 and 173 for aooordance of the curves. Their equations are given 
on p. 183, where we must write 4« -1 =p +1, or »=2p+8. The two curves have then the same first four 
moment-coeffioienti. If 17= F/{ 2 >/(p+T) (j> + 2) }, then the proportional area from 17 * 0 up to any arbitrary 
value of v given by 4Z, (J, p +1), where Z n (4, p +1)=B, (4, p +1 )/B (4, p +1), B^ and B being the 
incomplete and complete Beta-functions. 

t See Biometrika , Vol. ran. pp. 258—283, or Tables for Statisticians and Bumetrieians t Part II, 
pp. cxxv—cxlii and pp. 169—177. 





298 Further Applications in Statistics of T m (x) Bessel Function 

P =const. X 0 a cfcr n f-j (“^rj dx z ...dx n . 

Write it**/*-#*, and proceeding in the same way, we find 

_ n P** / 1 3 \*p+* (w A ... ac n \p 
P * const, x e dXn \a) \ a n -* ) 

where li=*nx n — x 4-ff 8 — ... — # n « 

Continuing to repeat this process we ultimately get rid of all the variables but 
x n and find* 

. 1ip * H /x^\ n (p ^- 1 


P = const, x e 




d2„ 


•(**)• 


We now put this into the canonical form for a Type III frequency curve, i.e. 


s. \ p 




.(xx bis). 

Hence we must have P*=n(p + 1)-1, and P/A = wp/a, or A = a n —- . 

Accordingly: 

a r - n(p +1) — 1 

np 

Mean of x n = Mi « - — ^ .(xxi), 

^ A^p+i^ia^p+i) i 2 

^ —»* /*“ ~~pi~ 

where # and <r x are the mean and standard deviation of the population from which 
i sample of n is drawn. Lastly * 

Bl = n(p + l) an<1 + .J....(xxii). 

Clearly, if n and p are not very small, then (xx bis) will approach much nearer to 
a normal distribution than the parent population (xix). 

(7) We can now apply our results to particular cases. If we draw two individuals 
out of Type III curves like (xix), with the same skewness as measured by p, then if 
a and a' be their modal distances, and 

for these are all equivalent, then the distribution of Y is given by 

z = M' p+i {Y). 

If the two individuals are taken from absolutely the same population, i.e. a* « aj = a, 
then 


Y=P~ 


Xi — Xi 


(p + 1) 


Xf — Xi 


V(p+1) 


x,-w, 


a ''-'"'a - a. 

* This resalt was published by Church*, see Biometrika , Vol. xvnx, p. $86. 








Karl Pearson, S. A. Stouffer and F. N. David 


299 


Such results, however interesting in the case of experimental sampling in the 
Laboratory, where we have a knowledge of the parent population, will hardly be of 
practical service, because we should usually lack a knowledge of p, 2 and a m . 


Now turn to (xx), and suppose we have taken two samples of n and that their 
means are 2 n and then the distribution of Y » -j (£„' — *„) will be > 

* - £ MT r +± (F) - £ MTntp+D-i ( F ).(xxiii). 

There are now a variety of ways in which it is possible to express Y. In the first 
place P/A where p and a refer to the parent population, but mean — mode 

= say. Aga^n ^* = —. = . Thus we have 

p 3 8 a <r/ V&<r, * 


Y — n Xn ~ = Tltc(3! n — g-’ n ) __ 2« ( x n ~^n) (xxiv). 

Further, we need the value of the p +1 in the degree of the T m function; we 
have 


2 


P + 1 =5 ~—- = 


x — x (x — x) % /3i 


,(xxv). 


Here x t x t a x and Pi all refer like p to the parent population. Clearly some two of these 
quantities x and x, x and a Xy or Pi and a x must be known, or we cannot determine 
a and p. We shall see later that in certain other applications p is known, and then 
probably <r x is the best quantity to seek for. It might be thought that x would be 
easy to find. It may be so, if the start of the curve can be determined, but it must 
be remembered that x is the mean measured from a definite point of the parent 
population, i.e. the start of the parent population, and this may be quite unknown, 
x — x does not involve this knowledge, but the mode is not an easily determined 
character. On the whole Pi and <r x can probably be most easily obtained from the 
samples. Of course this refers to cases in which the parent population is unknown, 
but suspected of having a skewness which may be approximated to by a Type III 
curve The procedure here would be to determine to the second and third moment 
coefficients of the pooled samples, and thus obtain the best approximation which is 
available to pi and c x of the supposed parent population. 


We then take m * 


.(xxvi), 


in 1 , 

s/ Pi O'* 

and test whether the probability integral of T m (F) has a value sufficiently large 
to justify us in assuming that x n ' and came from the same population. 

Perhaps a more useful case occurs when one sample is sufficiently large to give 
reasonable values for the constants, and we ask whether the other could have been 
drawn from the same population. In this case we may determine p and a with 
sufficient accuracy from the large sample and measure the probability of x n for the 
second sample from (xx) or (xx bie) by aid of the Tables of the Incomplete T-function* 







300 Further Applications in Statistics of T m ( x) Bessel Function 

Generally we may state our problem to be this: We wish to know from the 
means of two samples whether they are consistent with these samples having been 
both drawn from the same population. We have no reason for supposing that 
population follows a normal distribution, or we may have good reason for supposing 
its distribution skew. Shall we do better to assume #i=*0 and the unknown parent 
population to be normal, or to work with the value of found from the pooled 
samples? Probably with samples of 25 or 20 the latter would be the wiser course; 
at any rate, on comparison with the former method, it would give us some measure 
of the extent to which skewness might invalidate our conclusions from the normal 
hypothesis. Unfortunately we do not at present know the distribution of the 
variance of samples drawn from a Type III curve, or indeed from any skew curve. 
Had we known it, it might be possible to construct a quantity like "Students z” 
with the additional advantage, however, that it would possess correlation between 
numerator and denominator. 

(8) Type III curve gives the distribution of frequency for other statistical func¬ 
tions than that of the means of samples drawn from a Type III distribution. One 
of the most important cases is that of the distribution of the standard deviations 
(or of the variances) of samples from a normal parent population. If S be the 
standard deviation, Mt the variance of the parent population = £ 2 , n the size of the 
sample, a its standard deviation, // 2 its variance, we have Helmerts Equation for the 


distribution of <r, 

/ r \ »- 2 w<r2 

y = [da] .(xxvii), 

or, expressed in terms of the variances, the Type III equation 

e'H [d^] .(xxviii). 


Hence if we have two samples of size n with variances /a 2 taken from normal 

parent populations of variance M 2 and M%, the distribution of the difference 

y_n(/A2 fi 2 \ 

2 Uf»' ~ Mj 

is given by z = \MT^ 2 ) ( Y) .(xxix). 

We are therefore in a position to determine whether the variances of two samples 
each measured in terms of the variance of its parent population are significantly 
different. 

If the two parent populations are identical, then F=* If, as may 

it 

often be the case, the parent population be unknown, then the only remedy is to 
take for M% the value provided by the two samples pooled. If we know the means 
x n and x n ' of the two samples to be the same, this will be £ (/a* 4* /**), so that the 
frequency of the difference will be given by 

iJHW (*£=£■>) .<■“->• 







Karl Pearson, 8. A. Stoitffer and F. N. David 301 


If on the other hand we know that W n is not equal to we have to put 

M% ■* 4- /h) + i (%n — ®nY> 

and have accordingly 

* = Ct' + ^+i (s£'-*„)*) ... • • (XXX1> - 

Dr Eldertons Tables provide the ordinates* of the above curves up to satnples 
of n * 25. For large samples at present we are thrown back on the Type VII curve, 
the probability integral of which is included in the Incomplete B-function Tables . 


(9) If we have a population of large size M consisting of v categories whose 
frequencies are 

mx,7n f , ...m„, 

and a sample of size N be taken giving categories of size 


Hi, w*, ... n v 

V 

only restricted by the condition S(n 9 ) * N, and we form the quantity 


„ _ 0*1 - «i)‘ , («* - W*)* , , (n„ - n„)2 

X *---■+• Z- T ... ~t --- 

A »i « 2 n v 


where n, = m,N/M, then the distribution of x follows the curve f 

y=*y 0 e-*x'x v -* [d*]. 

and the distribution of the Type VII curve 


.(xxxii), 

(xxxiii) 


y-yo'e iX (£x 2 ) 2 [d(ix*)] . (xxxiv). 

Now this Type VII curve, like that for the variance, has its p known, ■» J (v — 3), 
which can be found at once from v the number of categories in the sample. 
Further, its a, i.e. its modal distance, = £ (v — 3), and its standard deviation 

« (v - 1). Again, if required, /8i = j and & « 3 + j . 


Thus if we have two £*s, namely x* and ^' 2 , from two samples, the distribution 
of their difference will be given by 

. ( xxxv )- 

Accordingly we have obtained a measure of whether two x*’ 8 supposed to be due 
to sampling from the same population are reasonably probable. Given the ^’s 
the solution is independent of any knowledge of the parent population, beyond the 
number of categories used in determining the x*’s. 

We may mak e some remarks on the curve in (xxxv). The standard deviation 
of a T m curve is V^wi + 1 and in our case m-» - 2); therefore, if F«= — 

cr r wVv — 1» (v -1) + J (V — 1) = V + o\ x t, 
as it should do, since \y! % and are by hypothesis due to independent samples. 

* A probability integral table of T m (V) accompanies this paper. 

t Pearson, Phil. Mag . 1899, p. 989. For properties of the x* curve, see Drapers 1 Company Research 
Memoirs, Biometric Series , vxxx. 








302 Further Applications in Statistics of T m (x) Bessel Function 

6 

Again, JBi«* 0, of course, since the T m curve is symmetrical, and B t * 8 * 

Accordingly, «B t — 3 = i(j9g— 3) of the distribution or the kurtosis is halved, 
i.e. the curve is 50% less leptokurtic than the parent population. 

Further, the distribution of the difference does not depend directly f on the 

# See above, p. 294, Equation (viii). 

+ Indirectly, of coarse, it does, & point too often overlooked. Take, for example, the data for Typhoid 
Inoculated and Attacked, due to Greenwood and Yule. 



Attacked 

Not Attacked 

Totals 

Inoculated ... 

56 

6,769 

6,815 

Non-Inoculated 

272 

11,896 

11,668 

Totals 

828 

18,155 

18,488 


What is the exaot meaning of the 6,815 and 11,668? Are they simply samples of the two classes 
Inoculated and Non-Inoculated, that the recorders have taken, or have they some relation to the numbers 
in the community willing or unwilling to be inoculated? If the former, they are subject to the more or 
less arbitrary choice of the recorders, and if we find x 2 we are finding it subject to the supposition that 
the recorders repeatedly made experiments with the same numbers. In this case there is only one degree 
of freedom in this table, or k~ 1 degrees, when two populations with k categories are compared. It 
seems in many respects more advantageous to treat this problem in the manner it was first investigated, 
namely as the comparison of two linear series, when the limitation on the degrees of freedom is seen at 
once to arise and arise naturally ( Biometrika , Vol. viii. pp. 250—254). 

But suppose the numbers of Inoculated and Non-Inoculated arise from some natural division, as in 
the case of vaooination, and non-vaccination in the country at large, then our table represents an arbitrary 
sample of the total population, and there are three degrees of freedom, and thiB must be borne in mind 
in determining the probability of the observed result. In such a case we may or may not know the 
relative frequencies of the inooulated and non-inooulated in the population under consideration. If we 
do not, the only thing we can do is to use the observed ratio, as if it were the population ratio of the 
two classes. In the case of the ratio of inoculated to non-inoculated following as a natural order, i.e. a 
table obtained by a random sample out of a general population, where we select an individual without 
regard to whether he has been inoculated or attacked and afterwards inquire into details, the x 2 * 8 simply 
proportional to the total number of individuals selected. Thus the x 9 for the above table is 56*284, but 
had we taken a sample of half the size it would be (subject to variation of sampling) 28*117. In other 
words the value of x 2 and accordingly of P depends very largely on the size of the sample, and the com¬ 
parison of x*’ 8 for two tables of different totals can be made to give almost any value we please to the 
probability of the two x 9 ’ 8 being due to samples of different sizes from the same parent population. The 
quantity which would remain approximately the same would be the and in comparing two tables like 
the above of different sizes to teBt whether they come from the same population, it is rather the com¬ 
parison of and than of x 2 and x* which should guide us. 

If, on the other hand, the Bizes of the two samples of Inoculated and of Non-Inoculated in a table like 
that above have been arbitrarily selected, the x 2 will change widely with those sizes. For example, if we 
suppose the table formed by two independent samplings, one of the Inoculated and another of the Non* 
Inoculated, and then recording whether they had been attacked or not, the vertical marginal tables are 
at our choice, and for five arbitrary sizes of the two samples we have approximately: 


66 

6,759 

6,815 


56 

6,759 

6,815 


56 

6,759 

6,815 

272 

11,896 

11,668 


159 

6,656 

6,815 


68 

2,849 

2,917 

328 

. 

18,155 

18,488 


| 215 

18,415 

18,680 


124 

9,608 

9,782 













Karl Pearson, 8. A. Stouffer and F. N. David . 


308 


size of the samples on which g* and are based, but solely on the number of 
cells used in computing the x*’s being the same. 

A point here is, perhaps, worth noting as we have not seen it recorded. If we 
take two sets of N samples each, and mg*' be the means of the means of the 
two sets, then and m 3n ' will be distributed according to a Type VII curve, if 
the original parent population is, because in this case x* and 3? n ' are so distributed, 


M 


These tables give: 



(•) 

(*) 

(«) 

(d) 

(«) 

X* 

56-234 

60-185 

88*998 

28-117 

19*6802 


•008,042 

•008,678 

•003,802 

-008,042 

003,112 


28 

8,879 

8,407 

136 

5,698 

5,834 

164 

* f077 

9,241 


28 

8,379 

8,407 

68 

2,849 

2,917 

96 

6,228 

6,824 


A similar variation of x 9 arises if we take two arbitrary samples of attacked and not attacked, and 
inquire as to whether they were inoculated or not. In other words, when there is no “natural” proportion 
of the sizes of the two samples whioh are being compared, x 9 will vary widely (and accordingly the value 
of P by which independence is tested) owing to the size of arbitrary chosen samples. This variation of 
X s and of P, when there is no natural proportion in the two samples (as sex for example), is often over¬ 
looked in interpreting results where x 3 is really largely determined by the size of the samples compared. 

Another important point, to whioh we may draw attention here, is the relation, often postulated as 
completely definite, that x 9 * Jty 8 , where N is the size of the sample. If we couple this relation with the 
distribution of Jx 9 as given by the equation 

y=y»e~ ix ' (4x*)* (< ’" S) .(«). 

where v is the number of cells, then since the Mean of x* =v ~ 1 and its varianoe is 2 (v -1), we should 
expect 

^ v -1 - 2 (e-l) ... 

116 * 11 **=-^-, «•**.= -#•“ .(#• 

Now, if there be k columns and X rows in a contingency table, v = *X, but the mean value of & and the 
value of 0 * 41 , even if there be no association, are not 

(kX- 1)N and 2 (kX-1)/N .( 7 ), 

see § 17 below of this paper. In the very special case of no association and N large, they only approxi¬ 
mate to 

(k -1) (X -1 )/N and 2 (k- 1) (X -1 )/2V* 

and even then only agree with the values in ( 7 ), when * and X are indefinitely large, but of a definitely 
lower order than N. A contingency table is hardly likely to be of practical value under suoh conditions. 
The fact is that when we are studying the 0 * of a contingency table taken as a sample from an in¬ 
definitely large population, successive samples will not have the same marginal totals and the distribution 
of is not that of When we take two series eaoh of definite size, and test their independence by 
a (x*» P) test, we are really dealing with what one of the present writers long ago termed *• partial con¬ 
tingency,” but it behoves the user to state vary precisely what is the origin of the totals of his oompared 
series, and to remember that his P as measuring a degree of independence only applies to repeated 
comparison of series both of the same totals as the first, and that he cannot generalise as to the degree 
of dependence whioh would arise had he used other constant sizes. For the sake of statistical students 
the senior author of this paper believes it advisable to keep very distinct the usages of x 9 and and not 
obscure a difficult topio by assuming ft is merely xW* 









304 Farther Applications in Statistics of T m (x) Bessel Function 

and accordingly we have for the distribution of the difference — the 
curve 

z - i JfJVnfr+M-i Os.' - mg',)) .(xxxvi), 

where M is the total number of cases and p and a refer to the original parent 
population. We are thus able to test the difference between the means of the means 
of sets of samples. 

Similarly if m Mt , m Mt ' denote the means of the variances of two sets of samples 
of w, each N in number, taken from a normal parent distribution, then the distribu¬ 
tion of the difference of the variances is given by 

* = i ’ m™*) .(xxxvii). 

Results (xxxvi) and (xxxvii) may occasionally be useful. 

(10) Lastly suppose we have a contingency table, the number of cells being 
/cxX, and a sample of size N be taken from it, then we shall find that the mean 
square contingency, <f> a , of such a sample, under certain conditions , obeys the law of 
distribution 

y— yd (i e_ *** i, > 

but that € is not equal to Ni nor pi to \ (v — 3), where v = tc\ the number of cells. 
If two samples having mean square contingencies and <f>' 2 with the same number 
of cells \ and of the same size N be drawn under the above-mentioned condi¬ 
tion, then the frequency of their difference will be given by 

, y = iMT Pl+i {^e(4>' t -<!>*)} .(xxxviii). 

The conditions referred to will be discussed in a special section later. 

But in many cases e will not equal e\ and it is perhaps in practice a more usual 
problem to determine whether ft* may be reasonably supposed to be a sample from 
the same population as </> a , than to deal with and ^ 2 . The latter contain the total 
sizes of the two samples, but the <f> 2 and <f > /a denoting mean square contingency lead 
us at once to the problem of whether the coefficients of mean square contingency 
yf a /(l + and J<f> 2 /(lW<f> 2 ) and so the degrees of association in the two samples 
may be considered as reasonably accordant on the hypothesis of the samples being 
selected from the same population. 

The distribution of <j>' 2 — <f > 2 will be considered later. It will be found that, under 
certain conditions, it obeys the same law of frequency as the distribution of the 
first product moment coefficient pn. 

(11) There is another method of approaching these problems, and only illustra¬ 

tions from known curves or surfaces can tell us which method is generally the more 
effective or more suitable in a particular type of cases. The whole of our results 
depend upon quantities x nt W n ’\ fH \ x*> X 2 > which satisfy a surface which 

can be thrown into the form 


w**w 0 e-w>(VUy> [dVdUl 







Karl Pearson, 8. A. Stouffer and P. N. David 


305 


So far we have discussed the difference distribution V - V and shown that it is 
given by a T m function. We may now discuss the distribution of the ratio z*=Vj U*. 
Here V and U are independent and can take all values from 0 to oo. If we inte¬ 
grate out for them and there be M sets of V and U , we find jlf= w 0 r*(jp +1) which 
determines w 0 . Now if we consider U and V as measured along two rectangular 
axes, z » constant gives a line through the origin at slope tan* -1 *?, and if we transfer 
to z and U as variants, we must integrate keeping z constant from IT’s* 0 to oo and 

then for ta,ir l z from 0 to or z from 0 to oo. 


Thus we find w =* w 0 e ~ (1+£) u U 2p+1 z p [d Udz\ 

\ 

Put (1 + z) V= £ aiL ’lceeping e ■* const., we have 

w - w 0 e? fP +1 (1+ ^ r a [d£dz], 

where f goes from 0 to oo . 

Now integrate out for f and we find 

w==w 0 r(2p + 2) (T ~ 5Fi [dz] 

» r (2p + 2 ) *1 

m r»(p+i)(i + ^*. 


(xxxix). 


This is the frequency curve for the distribution of the ratio z *» V/U for a popu¬ 
lation of M ratios. To find its probability integral, we have the measure P H that 
the ratio should be greater than z 0 , 


r(2 P + 2) r- 


Z? 


(1 + z)*P+ % 


dz. 


Take 1 + z = - , dz = - \dy f and 

y y 




B i (p +1, p +1) 


I i (pH-1,p + 1) .(xl), 

l + *o 

1 


—. Here z 0 may be xfjx nt or 

1 


B(p+ 1, p + 1) 

or the incomplete B-function ratio for the value 

/uaV/Lij, or x' 2 lx*> according to the problem with which we are dealing. The quantity 
Ix(p' f q f ) is that tabled in the Tables of the Incomplete B function and from them 
P tQ can be readily found. In our case we have p' **q' —p + 1, or we are confined 
to the “diagonal” values of that Table. 


* Cases of the distribution of V/XJ have already been considered by B. A. Fisher, V. Bomanovsky, 
and E. S. Pearson with J. Neyman. For the purposes of the present paper we give an independent 
investigation, which throws the answer back on the Incomplete B-function Table . Fisher has provided 
a table whioh enables the probability of the ratio XJ/V to be determined by a transformation of Equation 
(xxxix). 





306 Further Applications in Statistics of T m (x) Bessel Function 


But the matter would not appear to end here. What we have measured is the 
improbability that V should exceed r 0 U. But we want to measure a certain degree 
of the probable round z = 1 ; we must cut off therefore an equal angle from the 

0U axis, if yft be the angle r 0 makes with 0V. Clearly cot yfr « Zq and tan yfr m 



Fig. 2. 


therefore the probability that V/U<- or U/V<z 0 is given by 

F (%p + 2 ) z v ^ 

1/20 r*(p + i) jo (i 

Taking now y' *» — , we have 

= -P*0. 

as it should, for we have cut off equal areas. 

Accordingly the total chance that V/U should exceed e 0 and U\V exceed z 0 is 

a fc - 2 f_ L (p+i,p+ i ) .(xii), 

which may be taken as a measure of the improbability of the ratios V/U and U/V 
occurring. 

The Tables of the Incomplete B-function provide up to p « 50, and a small 
portion of them are reproduced here for comparison. Clearly we need the argument 
only up to 05. (See Table II.) 

(i) Two means &„ and x n ' of two samples of size n from a Type VII parent 
population. 

Q s 7 * * 21 1 I” (P +1). «(P +1)} .(xlii). 

* 1 + 3 , 73 . 





Karl Pearson, 8. A. Stotjpfrr and F. N. David 307 

Thus unless the p of the parental population is known, it has to be approximated 
to from the samples. Our test would determine whether it was very improbable 
that the two samples were drawn from the same (a, p) curve. 

(11) Two variances and of two samples of size n from a supposed normal 
population. 

<2-/ =2/ J__{*(«-l),i(»-l)} .(xliii). 

Our test would determine whether it is likely that both samples were taken from 
normal populations, of the same variance, but not whether those normal populations 
had the same mean. 

j • 

(iii) Two )£*& with the same number of cells v in two samples. 

<W~ 2/ * (*(— 1U(«-!)) .(sliv). 

* lx 

(iv) In the case of the mean square contingency we have, under conditions to 
be discussed in § 16, 

Ge'* 27 _J_(Pi + 1. Pi + 1)..(*1V). 

If € ' os e, we have $*•/*«, but it would be clearly better to be able to provide Q**/** 
when e is not equal to € (see p. 304 above), and this will be considered later. 

(12) The previous discussion has indicated the necessity of a probability 
integral table for the T m (x) curve. We may write 

S m (x)~ [ T m (x)dx .(xlvi). 

Jo 

A table of S m (x) has been computed by Miss David (see below). The tabled 
value being S m (x\ it follows that $(1 -fa) the usual probability integral * *6 + 8 m (x), 
and £ (1 — a )« -5 — S m (x). Hence the probability that an observation will lie outside 
the limits ± x is given by 1 — 2 S m (x). Considering the case of and the 
difference ^' a — x* and the ratio x' 2 /X* &* ve e 4 ua l probability when we 

have 

1 - 2S Uv - 2 ) <**'* - lx*) • 27 1 (i (« -1). i («-!)} •••(xlvii), 

41 1 + xW 

where v is the number of cells under consideration. If we now give v values from 
2 upward, and to A values from 1 to 100, we are able by means of Table II 

to find the right-hand side of the equation. Hence by Table I to determine 

I)x1. 

and thus % 1 and x % = fcjt* The curves thus obtained are plotted in Fig. 3. Both 
the arithmetical work of computing their co-ordinates and the draughtsman’s work 
in producing the diagram were very laborious; the curves asymptote to the axes of 
X* and x* being of course symmetrical. They are not rectangular hyperbolas, although 
they might well be described as “ hyperboloidea.” 








Fig. 8. 

When the statistical coefficients by which we enter this diagram are not x 2 a °d x*» we must replace 
them by the following as coordinates: 

In case of: 

Variances from a normal curve and yj: 

7 ?/Aa/ 2 s , nyJITP, where n is the oommon size of the two samples, and 2 s the variance of the parent 
population or its substitute. 

Means from a Type III curve x n and x n ': / 

4 A j 4 n J * y 

-r=r “, -r=- i where n is the oommon size of the two samples; £ the standard deviation and A, 
•Jpi 1 •Jpi 2 

belong to the parent population or its substitute. The v in both these cases is = n. 

Two mean Square Contingencies 0 a 3 and 0 a ' 2 .* 

0 a 8 , 0 a' 8 , where p is the possible range of 0 g s = X - 1 , if *>X, v=2p l + 8 , while p { and p 2 are 

given by Equation (lviii). 

Two Correlation Ratios rp and i/* from a surface of zero correlation: 

(N-n-2)ij 2 , (N-n- 2) V s , where N is the size of the sample, n the number of arrays, and t> = n. 

Two Correlation Ratios rf and if* from a surface of finite correlation ; 

2prf, 2p 2 rf'*, where i?! and p 2 are given by Equation (lxxiv), and v=2p t + 8 . 

Two multiple correlation coefficients R* and R 

Replace 17 9 and tj ' 8 by R 8 and R f8 for the above two cases. 




Karl Pearson, S. A. Stouffer and F. N. Dayid 


309 


Given a value of v, say v » 10, then for values of and x* lying between the 
curve marked v »10 and its asymptotes the ratio gives a lesser probability than the 
difference. In other words, the difference test is a more stringent test than the 
ratio for all points (^ a , %' a ) lying inside a given v-curve, that is to say a lesser 
probability of the given hypothesis being correct. Since the area inside the curve 
is always far larger than the area outside the curve—i.e. between the curve and its 
asymptotes—it would thus appear that the difference test will as a general rule be 
likely to be the more stringent. But by simply noting the position of the (x* x! 2 ) 
point on the diagram (Fig. 3) it will be found possible to determine which is the 
more stringent test of a given hypothesis in any particular case. 

It may occur to the reader that if the P' or the P corresponding to %' 2 or x *> or 
indeed both be so small as to render it improbable that either of the compared series 
have a common origin, it is illogical to test whether x' 2 an d X 2 ^ ave relation. 
But a little consideration will show this is not so. For example, let Ci and C% be 
two processes of inoculation, and let the two processes be applied and the numbers 
attacked under the two processes be recorded, in each case against a non-inoculated 
control. Suppose we find in each case from its x* a ver y slender possibility of the 
inoculated and control series, being samples of the same parent population, we 
conclude that inoculation in this matter is of service. But granted this we are at 
liberty to inquire further whether the two processes of inoculation produce results 
so divergent that it is unlikely that they themselves could arise from the same 
population of inoculations. We are really testing whether one or other process is 
the more effective. Generally our problem will turn on the probability of a difference 
ix 2 ~ hx 2 or a ra tio X f2 /X 2 g rea ter than the observed occurring. Since no value of 
X 2 is “ impossible/* this probability is a perfectly definite one whatever the actual 
probabilities of x 2 and X 2 themselves may be. It may be very improbable that the 
X 2 sample belongs to a parent population A, or that the x 2 sample belongs to a 
parent population B, neither can be impossible, and accordingly there is no logical 
reason to hinder us from testing the probability of the combined difference or ratio 
occurring. All we must be careful about is the interpretation we give to our result. 


(13) Construction of Table /. 

This table was computed in the following manner by Miss F. N. David. 
It is known that* 


= ™K n (x)-K n+ i(x) .(xlviii), 


while 


T m {x) 


- _!. ggsis L (xiix) 

1 — . 1\ . 


vV2 + 

Substituting (xlix) in (xlviii) we have, after a slight reduction, 

2to m , , at dTmQc) 

dot 


T m+ i(oc)-- 


2 hi + 1 "2m+ 1 


•(1), 


an equation providing the differential coefficient of T m (x), 


* This follows at onoe from the equations in Biometrika , Vol. xxi. pp. 181 (footnote) and 184. 
Biometrika xxiv 20 






310 Further Applications in Statistics of T m (as) JBessel Function 


Integrating from x «* 0 to x, we find 

2 m 


<S m+ i (®) - 2^1 S m (x) j o 2m + 1 dx ** 


and integrating the last integral by parts, we conclude that 
-SWi (•) - S m (*) - g ~l T m (•) 


•(H). 


Now since Dr E. M. Elderton's Table gives T m (a?), we can from a knowledge of 
8 m (x) find S m + 1 (a), and thus by repeated use of (li) build up the table of S m (x). 
Since we require m to advance by 0*5 intervals, we need to find S±(x) and Sq(x). 

Now = J Tf t (x)dx, and T^(x)^ faer 9 ; 

accordingly 

«*(•)-*( 1 - 0 . 

Thus (x) could be and was calculated from Glaisher’s Table of the Exponential. 
From this value of (x) all the values of S m (x) for m = 1*5, 2*5,... ll’S in Table I 
were computed by (li) in succession. 

The value of [ T 0 (x)dx is not so easy to determine, because T 0 (x) is 

infinite when # = 0, and no quadrature formula is applicable. It was therefore 

resolved that Si (x) = I T\ (x) dx should first be found by quadrature, and then 
Jo 

So (x) » S x (x) -f xT 0 (x) 


be found from the result, since L x ^. 0 [®T 0 (#)} = 0, which surmounts the difficulty, and 
thus S 0 (x) was determined. The values of Si(x) obtained by quadratures from 
Ti (x) were computed by Mr E. C. Fieller, and appear in the column under 1*0 
of Table I. The ordinates were taken at intervals of 0*02 from x*= 0 to 0*6,01 of x from 
# = 0-6 to 40 and after x =4*0 up to x = 18*5 by intervals of 0*5. The work was 
laborious, the ordinates being calculated to eight figure accuracy, but the areas, 
given to eight figures, were scarcely to be trusted to the last digit, where there 
might be an error of 1 to 2. Thus the seventh decimal might sometimes, but rarely, 
be in error in a unit. For this reason Miss David's Table computed to eight figures 
was cut down to six for publication. Although the values of $ w (/c) for integer values 
-f0’5 of ni could be obtained with any desired degree of accuracy, those for integer 
values only depended on a quadrature, which it was difficult to make reliable to 
eight decimal places. As a matter of fact Miss David's eight figure table was used 
for all the illustrations which follow, but as linear interpolation was employed* as 
adequate for the purpose we had in view, we should have got nearly the same final 
results from the six figure table now published. 


Those who have occasion to use the table must be careful to note that from 
x * 0 to 4*0, the table advances by 0*1, but from x « 4*0 to x** 18*0 by 0*5, and this 
change must be borne in mind when interpolating into the table. 


* In a few oases where the value of x led Vff to the top of the table higher differences were introduced. 




Karl Pearson, 8. A. Stouffer and F. N. Dayid 


811 


If we are dealing with v categories, ^ (v — 2 ), and v is the number of cells 
indicated by n at the head of the column. 

(14) Construction of Table II. 

The probability of a ratio, e.g. is given by Equation (xlvii), and 

demands a table of /_x_ 1 )> 1 )}* where I»(p, q) is the incomplete 

i-i^ 

B-function ratio, or 

f ~ xy- 1 dx 

. , I*(P, (?) = ir...(lii). 

I — xf~ x dx 

Jo 

An important relation between two kinds of B-function ratios may be noted here, 

I*iP.P) = h{l + lA\.P)) .(IHiX 

where x «« 4 (x — £) 8 . 

In the actual table* we should not find (J, p) but only I*'(p, i). The relation 
between them is 

p)=l-h- X ’(p, £); 

thus we modify (liii) and put 

h ( p, p) - 1 ~ \h-x‘ (p> £)t .(liii &»)» 

where x* = 4 (x — £) 2 . 

In actual practice this relationship may be of considerable value as transforming 
a value of the incomplete B-function from a part of the table where interpolation is 
difficult to another part where it is easier. 

Those who wish to find I x (p, p) can either use the present papers Table II or 
use the values of P x (n) which have already been published in the Tables of the 
Probability Integral for Symmetrical Curves issued in Biometrika , Vol. xxil. 
pp. 274—283, or in Tables for Statisticians (Part II), pp. 169 —178. In this case 
^P«'(w) = J {1 + 1*{\> p)j is actually provided J and equals I x (p>p)> where 

x** $ (1 -f vV). 

The present Table II renders the discovery of the value of I 9 (p, p) very easy. 
It has not been carried further than x « *50, for \ only takes values from 1 to oo . 

* Now at press, and shortly to be issued. 

+ For example, consider 1^ (6, 6); its value taken out directly is *921,775,209. Now ar= *7, «'= *16, 
and 1 - #'=*84, thus the table gives 

L m (6 , J)= *156,449,682. 

Hence 1 - \ 1.^ (6, J) = l - *078,224,791 = *921,775,209, 

which is the value of I. 7 (6, 6) found directly. 

% Thus in the example of the previous footnote, we must look out under } (u -1) = 6, and *'=*16, and 
we find ‘921,775,2=(6, 6) to seven figures. 

20-2 






312 Further Applications in Statistics of T m (as) Bessel Function 

Table II was extracted from the Incomplete B-function manuscript by Miss 
M. T. Beer, and, as that table does not go further than 10*5 for the half-unit 
intervals, the values for 11*5 were computed by Miss Brenda Stoessiger de novo 
in order that the range of Tables I and II might be the same. We have cordially 
to acknowledge their aid as well as that of Miss M. Kirby for the diagrams, in 
particular for Fig. 8. 

(15) Illustrations of the Method of using Tables I and II, and of the value of 
Fig . 3. 

Illustration (i). The following tables are taken from a paper by K. Pearson: A 
Study of Trypanosome Strains *. 


TABLE i (a) of Memoir, 
Length of Trypanosomes in Microns . 


Goat as Host 

12 and 
under 

13 

14 

15 

16 

17 and 
over 

Totals 

Wild 0. morsitans Strain 

37 

55 

60 

32 

12 

4 

200 

Wild Game Strain 

17 

37 

73 

38 

26 

9 

200 


TABLE i (b) of Memoir. 
Length of Trypanosomes in Microns. 


Dog as Host 

12 and 
under 

18 

14 

15 

1G 

17 and 
over 

Totals 

Wild 0. morsitans Strain 

17 

34 

41 

40 

19 

9 

160 

Wild Game Strai 11 

12 

31 

57 

50 

24 

« 

180 


If we apply the method of Biometrika, Vol. vm. pp. 250—254, to ascertain whether 
the Wild G. morsitans Strain and Wild Game Strain are probably samples from the 
same population, we find x' 8 = 17-216 from Table i (a) and x 2 = 4*745 from Table i ( b) 
leading to P' =»*0042 and P * *4499 in the two cases respectively for six cells. From 
the goat as host we should probably argue that the two strains of trypanosomes were 
different, from the dog as host that they were the same. While the x' a for the goat 
is very improbable, we must remember that it is not impossible. Two possibilities 
now arise: (a) the two strains are not differentiated by their hosts, (b) the two strains 
are differentiated by their host in the same manner. Are the two ^ a ’s compatible 
with each other on either of these hypotheses? We have 

ix' a - be 2 ®* and ^ = 3*6282. 

7C 

* Biometrika , Vol. x. 1914—1915, pp. 117—118. 




Karl Pearson, 8. A. Stouffer and P. N. David 


813 


What do our two tests give us for the probability of compatability in these two 
X #, s? We have for the difference test 

P 4x , 2 _ 4x2 - 2 {-5 - S % (6*2355)} - 2 {‘5 - *493,187} 

* *0136, from Table I, 

Q x * /x , = 2 1. ml (2*5, 2-5) * 2 x *0918,7173 
= *1837, from Table II. 

Fig. 3, p. 308, indicates at once that with %' 2 = 17*216 and 4*745, our point 
is very considerably inside the curve for 6 , or without working out the numerical 
results we know that that difference test will be more stringent than the ratio test. 
Clearly the ratio test gives us a moderate probability of either (a) or (b) being the 
fact, but the difference test suggests that neither hypothesis is correct, or that goat 
and dog react on the trypanosome strains in different manners. This is in accordance 
with the P' and P found in the first place for the two tables, but the ratio test being less 
stringent obscures the first impressions drawn from P' and P. This particular illustra¬ 
tion was taken without any knowledge of what the tests would lead to. A similar 
example, with the x' 2 smaller, might have made it less easy to draw any definite 
conclusions from P' and P, while P ^ 2 ^ 2 and Q x > 2 j x 2 might one or both give rise 

to conclusive results. 

Illustration (ii). To illustrate the last remark we will take two further tables 
from Pearson’s Memoir on Trypanosomes. They are as follows: 


TABLE ii (a). 

Length of Trypanosomes in Microns. 


Goat as Host 

11 and 
under 

12 

13 

14 

15 

16 and 
over 

Totals 

Wild O. morsitans Strain 

16 

21 

55 

60 

32 

16 

200 

Mvora Cattle Strain ... 

5 

14 

22 

26 

p 

19 

14 

100 


TABLE ii ( 6 ). 

Length of Trypanosomes in Microns . 



The x 2 f° r Table ii(a) obtained with a view to testing whether the Wild G. 
morsitans and the Mvera Cattle strains could be samples of the same trypanosome 
population = 6*468, rendering for six cells a probability P = *3646, or we may say 

















314 Further Applications in Statistics of T m (x) Bessel Function 

the Goat as Host cannot be considered as distinguishing between the two strains. 
We now turn to Table ii (6) with the Dog as Host and find x ,Ios 6*391, with the 
probability P=*2728. The probability is somewhat less, but far from sufficiently 
less to enable us to say that the Dog as Host will distinguish between the two strains. 

We can now ask on the basis of both tests if it be indifferent whether the differ¬ 
ence between the two strains be tested on Goat or Dog? 

What is the probability in fact that, in the case of these two strains, the Dog 
results might have been obtained from the Goat or the Goat results from the Dog? 

We have for the difference test 

^ix /9 “!x 8 * ® ~ ^ 2 (*4615)}, or, from Table I, 

=* 2 {'5 - 096,251} = *8075. 

Again, for the ratio test, since x 2 /^^ 1*16885, 

®x' 8 /x 3 ~ 2'^ 4 * 107 ( 2 ’ 5 » 2 ’ 5 )> or > from Table 
= •8682. 

We see that from either test there was high probability of the goat or dog as 
host being indifferent, but the difference test gives slightly the more stringent 
result as Fig.3 d priori indicates it must do, although the x' 2 ) point i® not f ar 
removed from the v = 6 curve where the two tests give equivalent results. 

We might draw from Illustrations (i) and (ii) the conclusion that when the 
strains are sensibly identical the host is indifferent, but when the strains appear to 
be different one host may give a more marked reaction than another. 

Illustration (iii). Table iii(a) below was obtained from the schedules of 
Pearson’s inquiry into the condition of the Polish and Russian Jew immigrants into 
the East End of London. Table iii (6) was adapted from Table VII, p. 255, of 
Franz Boas’s work Descendants of Immigrants , New York, Columbia University Press, 
1912. The problem to be answered is this: The distributions of Cephalic Indices of 
the Jewish children born in their adopted country and those bom in their land of 
origin are significantly different. Can this difference be attributed to the same 
causes in England and in America? 

The tables are as follows: 


TABLE iii fa). (Peakson's Data.) 
Cephalic Indices (Central Values). 



Karl Pearson, 8. A. Stouffer and P. N. David 


315 


TABLE iii (b). (Boas’s Data.) 
Cephalic Indices (Central Values). 


Kale Jewish Boys 
aged 6 to 15 years 

Under 

77 

77‘5 

78*5 


80-5 

81*6 

82*5 

88*5 

84*5 

85*5 

86-5 

87*5 

88 and 
over 



48 

121 

155 

248 

263 


280 

244 

192 

140 

69 

119 



6 

10 

23 


47 

87 

02 

03 

105 

84 

82 

116 


The x a of Table j'i^(a) is 27 007 corresponding to a P of ’0057, and the x ' 8 °f 
Table iii ( b ) is 267*399 corresponding to a P' < ‘ 000 , 0001 . Thus the chance of the 
distribution of the Cephalic Index of Jewish boys bora in England being the same 
as that of Jewish boys born in Eastern Europe is small; the chance that Jewish 
boys born in America have the same distribution of Cephalic Index as that of Jewish 
boys bom in America is vanishingly small. Boas attributes the difference for America 
to the influence of the American environment causing the head shape of Jewish 
children born in America* to approach the Gentile value. Pearson supposes it may 
be due in the corresponding English case to some admixture of Gentile blood. 
Whatever the origins of the difference of we may ask how far is there any 
likelihood of the differences being due to a common cause. In other words, if we 
took samples of the children of immigrant Jews before and after immigration, what 
is the chance that two samples will have a difference in their equalling or 
exceeding that observed? The number of cells is 13, and a recourse to Fig. 3 shows 
us that the point (^' a , X s ) lies well within and away from the t>=13 curve; the 
difference method will therefore be far more stringent than the ratio method. 

We have £ x '* -£ x *. 114*746 and 

P^' 1 -j x * = 2 {*5 - $55 (114*746)}. 

But Table I shows us that S 5.5 (18) = *499,989 and S 5.5 (114*746) must be much 
nearer *5 than this, or 

P^^WBL 

Again, x'Vx*=9*2234, and 

<2 xW -2W(6,6) = 2J. ow (6,6) 

= *000,534. 

Both tests indicate that it is very improbable that the cephalic index divergences 
have the same cause in America and England, but the difference test is far more 
stringent. 

Thus far we have merely applied the (x 8 , X* 2 ) test an( ^ drawn an apparent 
conclusion from it, but in doing so we have really overlooked the warning given in 
the long footnote to p. 302. While in the case of the trypanosomes in Illustrations (i) 
and (ii), we have been dealing with total frequencies of much the same order in both 

* It is important to note that the mere residence in America is not supposed to modify the head shape 
of children coming to America. It is the foot of birth in America which is credited with the change. 














316 Further Applications in Statistics of T m (x) Bessel Function 

the two sets of tables, so that N ' and N of x % an( * will hardly affect the result; 
in the present illustration Boas's total number of individuals is nine times Pearson's. 
Hence, if his proportions remaining the same were reduced to Pearson's total, his 
X ' 2 would be 28*600 and J (x 2 X 2 ) “ 0*3465, giving S$.i (0*3465) instead of 

S 6 . 6 (114*746). We should thus reach 

V-i**- 9149 ' 

or with a high degree of probability conclude that the difference between Pearson's 
and Boas's Jews and Gentiles could be attributed to a common source. 

We do not say that the process here adopted is wholly legitimate, but it does 
indicate the need for caution in applying the (x' 2 > x 2 ) e i^ er form*, and 

suggests that a (<f>' 2 t <f > a ) test may be better applicable when the marginal totals are 
so different and so arbitrary f. 

Illustration (iv). The matter of the preceding illustration may be pursued in 
a somewhat different direction. The cephalic indices of Jew and Gentile are 
markedly divergent. If we take as our Gentiles two such closely allied races as 
English and Swedish, with an almost identical mean index, will either of our 
tests suffice to indicate a marked difference between the ^'s for the two series ? 

Two such series are shown in Tables iv (a) and ( b ). Table iv (a) is taken from a 
paper by Nathaniel O. M. Hirsch, entitled: “Cephalic Index of American born 
Children of three Foreign Groups J." 

Table iv( 6 ) is based on Pearson's data for the Jewish Children of East London, 
and on his data for English School Children. 


TABLE iv (a). (Hirsch’s Data.) 
Cephalic Index (Central Values). 



Here x 2 ** 1341757, giving a P for 15 cells < *000,0005, and there is no 
practical probability of the two samples coming from the same population. 

* 4989 6) =s*9670, the difference test being the more stringent. 

t The ratio of numbers born in the adopted country to those born in the native land is hardly a 
“natural” one; for England it is 2*21 and for America 2-85. 

t American Journal of Pkytical Anthropology , Vol. x. 19X0, pp. 79—90, Table I, p. 80. 

















Karl Pearson, S. A. Stouffer and F. N. David 


317 


TABLE iv ( b ). (Pearson's Data.) 
Cephalic Index (Central Values). 


Males bom 
in England 

Under 

73*95 

74*45 

75*45 

76*45 

77*45 

78*45 

79*45 

80-45 

Eastern Jews 



5 

7 

10 

13 

20 

28 

English 

146 

86 

167 

226 

272 

342 

266 

230 


— 

Males born 
in England 

81*45 

82*45 

83*45 

84*45 

85*46 

86-45 

86*95 

and 

over 

Totals 

Eastern Jews 

31 

24 

29 

20 

19 

8 

18 

232 

English 

18H 

145 

99 

61 

39 

26 

21 

2313 


Here y ' 2 = 234 6659 for the 15 cells, and again P* is < *000,0005. Thus both series 
of data are in accord in indicating that the Jewish male child differs essentially in 
head shape from either of these series of Gentile male children. But a new 
problem arises: Is it indifferent whether the Gentiles considered are English or 
Swedish, what is the probability that x ' 2 an d X* cou ld result from two samples 
drawn from the same Jew-Gentile population? 

We will apply first the ratio test. Here x’Vx * 25 1*748,954, and we have, since 
v = 15, and J (v — 1) * 7, 

**(7, 7)«*3077. 

Thus on the basis of the ratio test, it is not at all improbable that x' 2 and x 2 could 
have arisen from the same population, i.e. it is indifferent whether Swedish or 
English boys be compared with the Jews. 

Now let us consider the difference test. We have 4y' a —4 y* = 50*2451, and 
4 (v-2) = 6*5. Hence 

P 4x ^ ixa -2{‘5-^. 6 (50'2451)}. 

Now 50*2451 is outside the limits of our Table I and £* 5 ( 18 ) = *499,97663, so 
£*5 (50*2451) has a greater value than this, and we can only say that 

P 4x 8 ^ *000,047. 

In other words, we should conclude that the difference test strongly points to a 
divergence between the use of English and Swedes as the Gentile fhctor, and this 
would certainly be in accordance with the views of anthropologists, and in 
particular craniologists. We thus see that the greater stringency of the difference 
test has led us to a result more in accordance with fact than the ratio test, and 





318 Further Applications in Statistics of T m (x) Bessel Function 


this might apparently justify us in granting it a position at least alongside the ratio 
test as a statistical method. But here again: Is any conclusion legitimate based 
upon the series to be compared being as arbitrary in size as the four series of these 
two tables ? These sizes are perfectly arbitrary in the two cases, they were deter¬ 
mined by the data each observer chanced to collect, and not by any u natural* pro¬ 
portions. All our analysis tell us is: that if a long series of further experiments 
were made, always with the same totals for the four series, we should have 
frequencies determined by the above and ^ we 


sacrifice the increased accuracy obtained in the case of Hirsch's Jews and Pearson’s 
English and reduced all four series to a common total M ? Pearson’s formula of 
1911* gives 


_ 


= S 


■NN' 
(N+N’f 


f+f 

N + N' 


Here the part within curled brackets consists only of proportional frequency and ) 
neglecting influence of random sampling, would remain unchanged if N and N' 
were modified. Accordingly, if we multiply each by 


(N + N')* MM' 

NN' X (M+My’ 

we shall reduce it to what would arise if we had the series M and M 9 instead of 
N and N\ As there is no reason whatever why we should not take as many Jews 
as Gentiles, we may put M = M', or the multiplier is (N + N') 2 /4>NN\ For Hirsch’s 
data the multiplier is 116424, and for Pearsons 301753. Thus we have 

mX 2 “ 156 2127, mx' 2 — 7081139, 


giving - ix 2 * 275*9506 and ^ ,8 /^ a = 4*53303. These lead to 
P 4x /a - ix 8 * 1 ~ (275*9506) < *000,047 

and much less; and 

Q x * tx * = 2I.W078S (7, 7)-*007,787. 

Both probabilities now oppose the suggestion that we are merely comparing Jew 
and Gentile, they indicate that there is a real difference between Jews compared 
with Swedes and Jews compared with English. The difference test is, however, 
much the more stringent. 


Illustration (v). We may now turn to another form of application of our method, 
namely to judicial statistics. We take our data from Judicial Statistics, England 
and Wales, 1925 (Criminal Statistics), Table VII, pp. 68—69, and 1930, Table VII, 
pp. 66—57; published by H.M. Stationery Office in 1927 and 1932 respectively. 


Biomtriha, Vol. viii. p. 252. 



Karl Pearson, S. A. Stouffer and F. N. David 


319 


Males convicted in England and Wales in Assizes and Quarter Sessions, 

by Age Groups. 

TABLE v (o). 

Crimes against the Person. 


Tear 

Under 

16 

16 and 
under 21 

21 and 
under 80 

80 and 
under 40 

40 and 
under 60 

Over 

60 

i 

Totals 

1925 

12 

137 

369 

316 

297 

45 

1178 

19.30 

6 

124 

326 

259 

257 

36 

1008 









Totals 

18 

261 

695 

577 

554 

81 

2186 


TABLE v (b). 

Crimes against Property with Violence. 


Year 

Under 

16 and 

21 and 

80 and 

40 and 

Over 

Totals 

16 

under 21 

under 80 

under 40 

under 60 

60 

1925 

19 

591 

1059 

423 

287 

47 

2426 

1930 

28 

877 

1389 

.j 

506 

327 

61 

3188 

Totals 

47 

1468 

2448 

929 

614 

108 

5614 


Superficially it would appear that Crimes against the Person have decreased at 
each age, and Crimes against Property with Violence have increased at each age*. 

The x* for Table v (a) = 2 0207 indicating a value of P for six cells of *8460; the 
samples for the two years might accordingly have arisen from the same population, 
or we cannot by this test assert a fall in the five years of Crimes against the 
Person. 

Turning to Table v(b) we have x ' 1 ■* 10 5304 with P’ = •0626; this is not 
absolutely against the 1925 and 1930 results being samples of the same popula¬ 
tion—if they were, one sample in about 17 would give a greater discrepancy 
between the two years than the present one—but it does not like the P of the x* 
of the first table suggest no change in the intensity of crime for the two years. 

We may now turn to the usual secondary problem: Is it likely that such 
changes as are exhibited in the two tables are compatible with a common origin? 
We ask if the %'* x* oould arise from sampling from a common source. We do 
not define thiB common source; it may be that both crimes against the person and 
against property with violence at each age are decreasing or are stationary or are 

* It it to be noted that the data pay no attention to shaagea in the population of each age group in 
the five years. 





320 Further Applications in Statistics of T m (%) Bessel Function 

increasing. None of these possibilities is definitely ruled out by an overwhelming 
improbability, but some of them are not very probable in either one or other case. 
We have to consider whether and are improbable as a result of sampling from 
a common population. 

Our Fig. 3 again shows that the difference test will be the more stringent, but 
we are not so far from the v = 6 curve as to believe the two tests will differ much 
in any inference to be drawn from them. We have at once, 

P hx'*-hx* = n5~&( 4 * 26485 )} 

« *0646, from Table I. 

Again Q _ , = 2 1 l (2-5, 2‘5) 

* ' 1 + 5-21126 

= 2/. 16100 (2-5, 2 5) 

= *0942, from Table II. 

The difference test is again more stringent than the ratio test, the former gives odds of 
16 to 1 and the latter of 10 to 1 roughly against a common basis for the two tables. 
On the whole we should probably conclude that the changes noted might not be 
attributable to a common source, but we should not venture to be dogmatic about 
such a conclusion. It will be noticed that the probability is almost the 

same as the P' for J^' 2 , or the difference test pays here little attention to the 
table in which there is a high probability of the two series being samples of the 
same population. 

Illustration (vi). We will take further data from the same source, and consider 
whether the changes in the age distributions of those convicted of simple larceny 
in the years 1925 and 1930 can be contributed to some common cause in the case 
of the two sexes. The Tables vi (a) and vi (&) are taken from the Judicial 
Statistics , England and Wales , Table X, p. 81, for 1925, and Table X(a), p. 70, 
for 1930. 

Sex and Age of Persons convicted of Simple Larceny in Courts of Summary Juris¬ 
diction {including Juvenile Courts ), England and Wales , 1925 and 1930. 

TABLE vi (a). 

Males . 


Year 

Under 

14 

14 and 
under 16 

16 and 
under 21 

21 and 
under 80 

80 and 
under 40 

40 and 
under 60 

Over 

60 

Totals 

1925 

828 

726 

2682 

4786 

2775 

2352 

347 

14496 

1930 

334 

588 

2662 

4937 

3273 

2474 

390 

14658 

Totals 

1162 

1314 

5344 

9723 

6048 

i 

4826 

737 

29154 















Karl Pearson, S. A. Stouffer and F. N. David 


321 


TABLE vi ( 6 ). 
Females. 


Year 

Under 

14 and 

16 and 

21 and 

80 and 

40 and 

Over 

Totals 

14 

under 16 

under 21 

under 80 

under 40 

under 60 

60 

1925 

59 

54 

172 

460 

489 

637 

75 

1846 

1930 

17 

36 

123 

455 

516 

618 

61 

1826 

Totals 

76 

90 

295 

915 

1005 

1155 

136 

B 


The main feature of the two tables is the decrease in juvenile and the increase in 
adult thieving. Is the source of this the same for the two series*? 

The x 2 f° r Table vi(a) is 272*6336, which for v » 7 connotes a probability 
P'< *000,0001 for the two years being samples of the same population. The x 2 f° r 
Table vi (b) is 42*7161, connoting a probability F< *000,0005 for the two years 
being samples of the same population. Thus in the case of both males and females 
there has been a most significant change in the age distributions. We then turn 
to the problem of whether this change can be attributed to the same source in the 
two sexes. We have 

♦ x ' 2 lx 2 - 6*3825 and \x! 2 ~ bx* - 114*95875. 

Turning to the ratio test first, 

Q x ' 2/ x 2 — 2/.1354# (3, 3) 

«*0403. 

On the ratio test accordingly the odds are about 24 to 1 against two such values, 
X 2 and x a » occurring, if there were a common source. We should say therefore that 
it was unlikely, but not excessively improbable that the age changes in larceny 
were the same for the two sexes. 

We next take the difference test, 

-P j x '2 _ j x s — 2 {*5 — ( 114 ’958 7 5)}. 

The value of this S t . 6 function lies outside our table, but we can say it is con¬ 
siderably greater than $ 2 . 5 ( 18 ) «*499,9996, or we have 

F $ x ^ - jx 9 ^ *000,0008. 

In other words the difference test shows that the probability of the changes in the 
two sexes being due to a common source is .so vanishingly small that we may safely 
assert that they are mt so due. 

* The “source” or “sources” may not be changes in the economic or moral state of the population: 
instead of being of sociological origin the sonroe may lie in police regulations, or in juridical changes; we 
hazard no suggestion. 









322 Further Applications in Statistics of T m (x) Bessel Function 

Thus the greater stringency of the difference test leads us to a far more definite 
conclusion than the ratio test does. It may be noted that the marginal totals in 
Illustrations (v) and (vi) are not the arbitrary sizes of samples, they are the actual 
populations of criminals caught and convicted and we cannot modify their 
numbers. 

Illustration (vii). We will now apply our methods to a different type of 
investigation, namely to testing whether the means and variances of small samples 
are differentiated. 

The following data are drawn from Dr M. H. Williams’ measurements of boys 
aged 12 in rural schools in Worcestershire*. 

TABLE vii. 


Central Heights in Inches of 12 years old Worcestershire Schoolboys . 


Group 

4* 

49 

50 

51 

52 

58 

54 

55 

50 

57 

58 

59 

60 

6t 

62 

68 

64 

School E . 


1 


3 

2 

1 


4 

3 


1 

1 






School F . 

— 

— 

— 

— 

2 

3 

4 

1 

4 

1 

— 

— 

1 

— 

— 

— 


Aggregate . 

2 

1 

8*5 

22-5 

24‘5 

27 

45 

50 

38 

32 

18-5 

4 

6 

5 

— 

— 

1 

Aggregate less E and F 

2 

— 

8*5 

19*5 

20*5 

23 

41 

45 

31 

31 

17*5 

3 

5 

5 

— 

— 

1 

Combined E and F ... 

— 

1 

— 

3 

4 

4 

4 

5 

7 

1 

1 

1 

l 

— 

— 

— 

— 


The means and variances of these groups are as follows: 



Mean 

Variance 

A 

School E 

School F . 

Aggregate . 

Aggregate less E and F 
Combiued E and F ... 

j 

G4"*0000 

54" *6875 
54"*7105 

54" *7569 
54"*34375 

7*291,667 

4*006,510 

6*539,012 

6*128,3283 

5*767,2526 

_ 

•0085 

*0384 

*0163 


If we have no information as to the & of the aggregate, which sufficiently 
indicates the symmetry of the distribution, we have (a) the experience that the 
distribution of stature is very approximately normal for a given age, and (6) the 
evidence that it is bo from the combined samples E and F. The normality of the 
parent distribution being assumed, we can proceed to test the hypothesis of the 
equality of the variances for the two schools. The requisite formula is deducible 
from (xxxi), if we suppose the means of the two samples to be unequal. We have 


P lh '- M ’ :2 j' 6 ~ S i(n- 2) $ (£^_ £„)«)} • 


# “A Statistical Study of Oral Temperatures,” by M. H. Williams, M3., Julia Bell, M.A. and Karl 
Pearson. Studie* in National Deterioration, Draper*' Company Reteareh Memoir «, No. IX, Table LXII, 
p. 109, Cambridge University Press. 



Kabl Pearson, 8. A. Stouffer and F. N. David 


323 


Now n = 16, - Mt =3-286,167, 

in' + m =11-298,177, 
Xn' — Xn = "6876. 

Hence P *' - M " 2 l' 6 “ (4*55698)} 

- -228,37 by Table! 


It is thus quite possible that fit and m would be equal, if the samples were 
indefinitely increased. 


Let us consider what would happen if we took instead of the variance of the 
combined samples the variance M% of the aggregate of the Worcestershire schools, 
i.e. 6*639,012. In tlgk case the argument of the S m function is 


and 


n in —jn 8 x 3-285,167 


4-019,148 


/n 6-639,012 
= -285,09. 


This makes the equality of the variance in different schools somewhat more 
probable, but still of much the same order, while if we take the aggregate less 
Schools E and F> we shall get an intermediate value. This example suggests that 
it will be adequate in many cases to use the variance of the combined samples in 
place of the usually unknown variance of the aggregate. 


* The appropriate equation to use when the samples are of the same size for the 
ratio is (xliii), or since Ps7/**= 1*819,955, we have 


- -258,0! 


The difference test is here slightly more stringent than the ratio test, but 
neither is incompatible with the hypothesis that the standard deviations may be 
the same in the two schools. The reader must be cautious in applying Fig. 3 to 
such a case; he must not use /**' and fit as corresponding to x’ 2 and x 2 and deter¬ 
mine the point (fit, fit) on the diagram. If he did so, he would find that point well 
outside the v — 16 curve and so conclude that the ratio test was the more stringent. 
Equations (xxix) to (xxxi) indicate that the correspondence is between x' 2 an d 
nfitjM 2 and x 2 an< l ^/^a» or in our particular illustration, the values 


16 x 7*291,667 
5*767,2526 


16 x 4*006,510 
5*767,2526 


or 20*229 and 11*115. 


Looking this point out on the diagram we find it just inside the v =16 curve, showing 
that the tests will give approximately equal probabilities, but the difference test 
the smaller one. A like procedure must be adopted in testing oa Fig. 3 the 
relative stringency of other applications of the two methods, i.e. we must inquire 
from the argument of the S m function what values correspond to x' 2 an< * %*• 
Common factors disappearing in the ratio are apt to mislead, when we apply the 
difference test. 



324 Further Applications in Statistics of T m (x) Bessel Function 


It will be found generally advisable to use (xxxi) instead of (xxx), unless a 
preliminary inquiry has settled whether W n * and x n may be considered equal. The 
tests for this may be practically treated as twofold. 

(a) If and Xi be the means of two samples of the same size taken from a 
normal population of standard deviation 2, and the samples be perfectly independent, 
then — x t will be distributed normally with standard deviation (22 a /n)*« 
Accordingly if we form the ratio 

Vra 

* ‘“Vis 

wc can test for its probability by the integral of the normal curve. When we do 
not know the parent population, the best value to give to 2 appears to be that of 
the combined samples. But here a question arises: Should 2 2 be taken equal to 
i (pi + p»), which it would be if actually x 2 = x x —our hypothesis—or to 

i (p% + pi) + i (#a — #i) a > 

which is the observed value? We incline to the latter alternative. Accordingly 

p = _ (£ t -f i) Vw = 68 75 x 4 _. 80fl7 

6 + i («* - %)* = Vl1f^347505 

in our present case. 


The probability of a ratio as large as or larger than this is *20906, and taking 
the possibility of either sign for # 2 —we have *4181 for the chance of a 
deviation of this size. The test therefore indicates that it would be legitimate ffo 
consider the difference between x 2 and x% as due to random sampling. 

If we take for 2 2 the value 6*539,012 of the aggregate, which would render 
our theory more accurate *, we find the probability = *4470, and although this 
differs to some extent from *4181, it leads to precisely the same conclusion. 


(6) We may adopt “Student’s z test.” Here the same assumption of 
normality for the parent population is made, bu t we divide x 2 - x x by the observed 
standard deviation of the difference « V(/i 2 + pi). Thus in our case 


*6875 


= *20454, 


Vll*298,177 

from which we obtain the chance of the mean difference of the samples lying out¬ 
side ±*6875 to be *4406, a value lying between the two values deduced from 
method (a), and thus confirming the result that the schools very probably have the 
same mean stature for boys of 12. 

But to assume this makes no sensible difference in the conclusions already drawn 
with regard to the variances of the two samples. 


Illustration (viii). The following data for pulse rate and oral temperature are 
taken from the memoir referred to in the last Illustration, Table XI, p. 84. 

* Not absolutely so, because the theory assumes that we sample from an indefinitely large popula¬ 
tion, and that this population is strictly normal. 



Karl Pearson, S. A. Stouffer and F. N. David 


326 


TABLE viii. 


Pulse Rate and Body Temperature in Children ,. 

Pulse Bate 


Body 

Temperature 

48 

49 

50 

51 

58 

53 

54 

55 

56 

57 

58 

59 

60 

61 

68 

68 

64 

65 

66 

67 

68 

69 

70 

71 

78 

73 

74 

75 

76* 

77 

Totals 

A. 98-4°. 


3 

7 

6 

5 

1 









m 

22 

B. 99*4°. 

— 

— 

— 

2 

2 

8 

5 

3 

— 

2 

— 

— 

— 

— 

E 

22 

A and B . 

— 

3 

7 

8 

7 

9 

5 

3 

— 

2 

— 

— 

— 

— 

H 

44 

All Temperatures 

; 

3 

33 

_* 

91 

189 

272 

134 

88 

34 

12 

7 

2 

2 

— 

1 

1 

869 


Before we discuss our samples let us ascertain something about the parent 
population, of which of course we might have no knowledge, or we might not 
know that pulse rate curves in children are very skew. We find that the following 
are the chief constants of the distribution: 


Mean = 56*8665, Variance « M% = 10*732,853, 
Standard Deviation *» 2 * 3*276,103, 
fa m *971,479, A- 6159,378, 


the distribution curve is therefore decidedly skew and also markedly leptokurtic. 

The distribution of means would have for its constants in the case of samples 
of 22: 

Mean = 56-8665, Variance - -*■ m -487,857, 

n 

= -- = -044,158, 5, = 3 + — 3 = 3' 143,608. 
n n 


This is not very widely removed from a normal distribution, and we might well 
conclude that a normal distribution for the means, as we do not know the true 
curve, would be sufficient in this case. An examination of the chart of the fa , fa plane * 
indicates that the corresponding Pearson curve would be Type IV, but that we are 
so close to the Gaussian point, that the normal curve would be likely to give us 
quite reasonable results. Hence we are thrown back on (a) or (b) of the last 
Illustration. We have 


(«k - Si) 5*727,2727 Vll 
V2S 3*276,103 


= 5*798,1076 


(liv). 


This corresponds to a probability of about *000,000,007 of such a difference 
occurring, if we based our investigation on a knowledge of the parent population. 
It means that samples confined to temperatures 98*4° cannot be considered as 
randomly chosen or that there is, which we know there to be, a correlation between 
pulse rate and body temperature. But what are we to do if we do not know the 


Biometrika xuv 


* See p. 66, Tablet for Statisticians , Part I* 


21 











326 Further Applications in Statistics of T m ( x) Bessel Function 


actual parent population, but suppose it for good reason to be skew ? We may put 
our data as follows: 

Pulse-Rate 


Group 

Mean 

Variance 

ft 

A. Body Temperature 98 

B. Body Temperature 99'4°... 

A and B combined . 

Aggregate of all Temperatures 

54*454,546 

60*181,818 

57-318,182 

56*866,513 

4*460,055 

8*997,245 

14*929,063 

10-732,853 

•024,044 

•343,350 

•237,756 

•971,479 


Now assuming that pulse rates can be approximately given by a Type III curve, 
we turn to equations (xxiii) and (xxiv). Considering first the argument of the 
T m function, we see that the first value of F in (xxiv) would be appropriate if we 
knew the mean and modal pulse rates, but the latter involves the determination of 
the mode. Now we can determine these quantities from Equation (viii). We have 

p * .1 - i ss 4*117,433 in the present case, a* ** a (1 + and a : hence 

Pi V Pf */p+1 

x aa a Vl + jp. Thus, for the present case, 

4*117,433 x 3 276,103 


V5*117,433 


»5*962,908. 


Mean — mode = x — a ** - = 1*448,210, 
P 

and modal pulse rate, accordingly, is given by 

55*418,303. 

Tlx* (X n > - X n ) Wl +p (Xn ~ %n) 


<*% 


Now F= 

and for our particular case 

Y = 87 003,946. 

Again, the curve being given by 

and n(p +1 ) — £ = 112*083,526, we have, for the distribution of F, 

Z — 2ii 2 .O085 ( Y) f 

and we need the area beyond 87*003,946. 

But neither T m (Y) nor S m (F) is tabled to such order of the function or to such 
an ordinate. We turn therefore to p. 297, where we see that after m «* 11*5 the 
T m (Y) function coincides with Pearson's Type VII curve. In order to obtain this 
Type VII curve, we must write, for the p + J in (xviii), p. 297, n (p -f1) - J of the 
present notation. It then transforms to 

.(lv). 


■* 0 - 


1 + 


T r 


i+i}) 


n(j> + l) + t 


4&(p +1 ) {n(p + 1) + 1\) 

* ?measured from start of the pulse rate ourve, whereas x n ' and x % are absolute values of the means. 





Karl Pearson, S. A- Stouffer and P. N. David 327 

In our particular case n(p + 1) + 2*5 *» 115*083,526, but for such values a Type VII 
curve is for all practical purposes a normal curve, Le, 

{n(g + l)+2‘5} 

Substituting the value of F, this becomes 


z — z$e 


n (V - *J a n ( p+l)+2*6 
“* * »(p+l) + l 


but the latter factor in the exponent is unity to the same degree of accuracy as we 
have used in passing from the Type VII to the normal curve. 

Thus our Bessel function T m (7) curve has reduced to the normal curve 


precisely the curve from which we obtained our first result that the two means were 
not compatible with being random samples from the pulse rate population. 

It may seem a misfortune that the example chosen does not fall within the 
range of Table I. We will accordingly try if we have any better luck with the 
ratio method. This is provided by Equation (xlii). But about this equation an im¬ 
portant point must be borne in mind, x n ' and are not the absolute means, but 
those means measured from the start of the curve. We must therefore subtract a or 
5*962,908 from x the modal value or 55*418,303, to get the start of the curve which 
is accordingly at 49 455,398 pulse rate*. Thus we have 

x n ' = 60181,818 - 49*455,398 «10*726,420 
and W n =54*454,546-49*455,398= 4*999,148, 

giving the ratio x n '/x n =* 2*145,650, or 

Qan'fo = 2/.8I790 (112*583,526, 112*583,526). 

Again we find no such high values of the incomplete B-function have been tabled. 
Writing m for 112*583,526, we have 


Qzn'ISn = 2 


1**31790 

x m~l ( 1 _ x) mr ~ 1 dx 

9 JO _ 

l — 

J x™" 1 (1 — x) m ^ x dx 


Put #= £ - x* and the transformation gives us 


J (1 — af 2 ) m dx' 

’ r4-*6 ; ~~ • 

(i 

J—5 


* It mast be remarked that in this case as in others the ratio as well as the difference test forces ns 
to make appeal by way of knowledge or of hypothesis, to the real or supposed parent population. See 
farther on this point p. 828, ftn. 


21—8 




328 Further Applications in Statistics of T m (x) Bessel Function 


But to is so large that we can safely replace our curve by the normal curve, and thus 

r 

> J'l 




Write 4m#' 2 = |£ 2 , and we have 


n , _ 9 >188 1 _ 

VsB»7®» “** * f+4 


A r\/8mx*5 1 

Q* n '/s n - 2 j 

>v/ 8 wX* 1821 V ™ 

Now \/8m x *5 — 14*93877 and may be replaced for this integral by oo and 

V§mx 1821 = 5-4407; 


thus 



-JP 




or, using the probability integral table of the normal curve, = *000,000,05. Thus we 
see that on the ratio hypothesis the randomness of the two samples is only slightly 
less improbable than on the difference test. Both involve a knowledge of the pulse 
rate distribution. It may be asked, what can we learn without this knowledge ? The 
reply must be that we can only work with the combinod A and B as representing 
to some extent *(!) the general pulse rate distribution. This gives 


and 


*237,7557, <r*= 3*863,8146, p+ 1 = 16*823,9941 
l = 133*757,708, 




2 71 Xft 


m = n(p + 1) - J « 365*627,871, 
or z = i^^886-e27,87i (133*757,708), 

a value still farther removed than before from the range of m and Y in the T m and 
S m tables, and still more appropriate to the application of a normal curve. The 


* In many oases probably we could appeal to experience as to what sort of value ft is likely to take; 
we know its value for many types of variate, weight, pulse rate, barometer heights, etc. If we do not have 
any suggestion from experience, the only value we can take is that obtained from the combined samples, 
even if it appears ludicrous to find ft on forty cases. But is it after all very much more absurd than 
finding a variance on twenty oases, which the problem also requires ? Again, it may be said that the difference 
test makes more appeal to or hypotheses about the supposed or known parent population than the ratio 
test does. But the reader must not forget that the whole theory of x f% in relation to x 2 i« based on the 
assumption that the relative frequencies of the parent population may be replaced by the relative 
frequencies of the combined samples. This is clear enough if we approach x 2 48 Pearson has done in 
Biometrika , Vol. vm. p. 262. 


If FJM be the relation frequency of the sth category of the parent population M, then the true x 9 is 
given by 


and it is not till we put 


M N + N” 


njt 

* (N + N')* 



i.e. the relative frequency of the combined samples, that we obtain 


the value given by various writers for x 9 . In other words, the very basis of the x* method is an appeal 
to the combined sample relative frequencies as a representation of an infinite parent population. The 
weakness of this when the samples may oonsist of 10 to 20 individuals is only too obvious. 



Karl Pkarson, 8. A. Stouffbr and F. N. David 


329 


normal curve in question will be that given by (lvi). We rauBt now write for £ in 
(liv) a a for 2, or 3*863,8146 for 3*276,103, and we obtain 

£-4*916,178, 

giving a probability 

P_, _ =*000,000,88, 

*n “ x n 

not nearly so stringent as the result when we know the parent population, but amply 
sufficient to indicate that there exists a real difference between the mean pulse 
rates at temperatures 98*4° and 99*4°. 

We have based the above investigation on the assumption that the parent 
population was a Type III curve with a considerable skewness, but we reach the 
important conclusion $Lat, even with a /8i of order 1*0, it will be adequate to apply 
the normal curve as describing the distribution of samples; with smaller samples 
and still greater skewness in the parent population, i.e. p smaller, the m = n(p +1) — 1 
may be small enough to come into the range of our 8 m table, or to bring 

I x {n(p + I), n(jp + l)} 

into the range, n (p 4- 1) = 50, of the B-fu notion tables. 


Unfortunately when we turn to test the standard deviations of the pulse rates 
of temperatures 98*4° and 99*4° we are somewhat at a loss for a method, for, as far as 
we are aware, no one has so far found a curve giving the distribution of either the 
standard deviations or the variances of samples from a Type III curve. All we can 
provide at present is a curve having the same first four moments as the required 
curve. To do so would lead us somewhat away from the main topic of the present 
paper. We can, however, give another illustration of Equation (xxxi) if we assume 
that we shall not go far wrong by using the Type III distribution of to apply 
approximately to this case. 


If we confine our attention to the two samples and their combination, we easily 
find from the table on p. 326, that 

£ n _ 11 x 4*537,19004 

~ £ W + ti) + i (in'- 3? n ) 8 ~ ~ 14*929,063 


.3*343,0825 


and m = £ (v — 2) = £ (u — 2) ■» 10. 

Thus = 2 {(*5 - S l0 (3*343,0825)} 

= *4532 from Table I. 


If we had used the variance of the total temperature curve, i.e. 10*732,853, 
we should have found P , = *3000. 

Ms “Ms 

Both indicate that it is not unreasonable on our hypothesis to suppose the 
variance of the two samples the same. Turning now to the ratio test, we have, since 
/**'//*« 2*017,2946, 

Q^ = 2l. m ,«( 10 *5,10*5) 

J — *1169 by Table II. 




830 Further Applications in Statistics of T m ( x) Bessel Function 


This is not entirely opposed to the equality of the two standard deviations, but 
gives that equality only a fourth of the probability. Thus for the first time in these 
illustrations the ratio gives the more stringent test. Let us see if this would ct priori 
have been indicated by Fig. S, p. S08; we must take 

for*'*: ^'andfor*-: 

where M%** i(/*/ + /*•)+ £(£„' — #„)*» 14*929,063, or 6*6293 and 3*2862 are the 
required values. The diagram indicates that this point is well outside the curve 
22, and thus the ratio will give the more stringent test. 

We have for the purpose of illustration used throughout both tests, but Fig. 3, 
p. 308, will always enable the investigator to choose d priori the test which provides 
the lesser probability. 


(16) We next turn to the important problem of determining from the mean 
square contingencies whether they may be considered as samples from a common 
parent population. Now if we have a k x X contingency table, where k ^ X, the mean 
square contingency ^ a must lie between 0 and X — 1. 

Thus as far as a Pearson curve may be considered applicable—this is our first 
condition—it must theoretically be of the limited range type or of form 

y - 2A> (^lV 1 (p - fa 8 )* 1 [#i 8 ].(lvii), 


where p *= X — 1. In the case of such a curve of known range the values of px and 
pi can be determined in terms of the mean <^ a and of the variance. Let these be 
*i 8 and then we have 

Pi + 1 = 


Pi+1 = 

Now when the Bize of the sample is fairly large, the variance of <f >will be a small 
quantity compared with the product of the two segments of the range as divided 
by the mean 4>*, i.e. $i*(p — $,•). 


(P ~ fr*) _ A 

P \ ' 


.(lviii). 


The most unfavourable case for the largeness of either pi or p* occurs when they 
are nearly equal and N the size of the sample is small. For example, if ^i a = *5 and 
cr^* is of the order *01 and «®X«2, then p x =p a = 11. (lvii) becomes in this case 
a Type II curve and the distribution of means of samples from such a curve has 
not yet been solved in any practically useful manner. But the ,8a for this case is 
about 2*8 and we should not err greatly by treating the distribution as practically 
normal* and then the distribution of the means of $i a would also be normal. 

Professor Kondo has made a second experiment for a 3 x 3 table with N = 250 
(Joe. cit. in footnote below, pp. 441, 419—420). The number of cells is here again 


* Professor Kondo (Biometrika, Vol. xxi. pp. 416—418) has dealt with a ease of this kind. He has 
dealt with the obarved mean and varianoe of fa* in 804 samples of 100, from a parent population of 
0i 8 = '5; here the observed value of ft*= *499,8006 rendered the curve slightly skew. 





Kabl Peabbon, 8. A. Stouffbb and F. N. David 331 

very limited, but any one who has endeavoured to take several hundred samples of 
contingency for 2 x 2 tables will appreciate the amount of labour involved. In this 
second experiment jpg is large, and it would be adequate practically to replace 
(lvii) by 

y**yo(fa*)^e P * 1 .(li*). 

If pi were the larger, we should have to measure our variate fa* from the other end 
of the range and it would be the term with power pi which would be replaced by 
the exponential. If therefore either p x or p% be large, we can write our distribution 
of fa* 

y'Byodefa'y'e'*** 1 * .(lixhw), 

■# 

where and pi and p% are given by (lviii). In this case the distribution of 

fa'* — fa*, if the samples are of the same size and the same number of cells, is 
given by 

z~\MT^{\e{fa'*-fa*)) .(lx), 

and the probability of a difference as great or greater is given by 

li*(<fc' a -<#>i £ )} .0*i). 

The corresponding probability of the ratio fa'*l<f>f is given by 

Q ^ !H 2 * 21 i (pi + 1, pi +1) .(lxii). 

Thus the difference test involves the determination of one more constant p% than 
the ratio test with pi only. But this does not much increase the labour, as p x already 
involves a knowledge of both fa* and <r\* f which are the laborious quantities to 
determine. The values of these quantities to a second approximation are given in 
the memoir by Professor Kondo, already referred to. We will now discuss special cases. 

(17) The problem to be discussed in this section is to find the probability that 
two values fa'* and fa* of mean square contingency could have been obtained for 
samples of the same size, N, the same number of cells, k x \, with a parent popula¬ 
tion of zero contingency. 

In this case we know that approximately* 

& = (« -1) (x - 1) i (l + i) .(lxiii), 

- ( - *—^ -(i + 4 )} -< lxiv >- 

We might compute these for any given case N, #, \, and then substitute in 
(lviii). But as the approximation is only of the second order, we can save workers 
some trouble by making the substitution, algebraically neglecting in the process 
terms which we are not warranted in retaining. 


Kondo, loe. eit. p. 408. 









382 Further Applications in Statistics of T m (%) Bessd Function 

Remembering that p « \ — 1, if X < #, and calling F the factor common to pi 
and pi, we have 

where v*(*-l)(X-l)-(#-2)-^-j. 

Now 

^i + i-^xJ’-i(*-i)(\-i)^i + ^|i + ^ + ^——— (^+4)j- 

-}(,-1)(x -1) jl + ^ + <«- 1)1 » ...<l„i). 

For the extreme case of N-*<x >, we see that + —1)(X— 1), or we 

have for the ratio test 

- 2/_ i {i(^-l)(X-l), *(^-l)(X- 1)}. 

For the test we found 

Qx'*/x*~ZI 1 {* O'-1H (*-!)}> 

1+xV 

so that we cannot paws to the <£i a test by writing merely v * *X. This would only 
be true if # and X were very large indeed, which would be very rare in practical 
work. For the like reason p% -f £ can only be taken as £ (v - 2), where v « tc \, when 
not only N is very large but (* + X — 2)/(*X) is a very small fraction. For example, if 
tc = X = 10, or a contingency table of 100 cells, which is very unusual, we can hardly 
assert that 18/100 is a very small fraction. 

Next turning to p* we have 

p ,+1 - imx- 1) {i - ^ (i+i)J (i+£++j>) 

-iif (x- i)(n- - t i).(ixvii). 

Illustration (ix). Suppose *o«x = 5 and N = 400, what is the probability that 
<f) i* “ ’02 and <fo' s = *07 could both be drawn as samples of N from a population of 
zero contingency ? 

Here we find from (lxiii) and (Ixiv) that *040,100, and o*% 1 *= *0001,9199, 
while p m x — 1 ■= 4. 

We may now proceed to find pi and p% either from (lviii), or from (lxvi) and 
(Ixvii). We cannot say that (lviii) is more accurate than (lxvi) and (Ixvii), because 
in (lviii) we retain terms which are of an order we have neglected in (lxiii) and 
(Ixiv), and which ought to be neglected because they have been neglected in passing 
from (lvii) to (lix). Working with (lviii) we find first from (lxiii) and (Ixiv), 

^i a - *040,100, 0001,9199, 




Karl Pearson, S. A. Stouffer and F. N. David 


833 


vhence the common factor F= 826*0847. Further fa*/p = *010,025, and 

l-&? = -989,975, 

P 

leading by (lviii) to 

px =7*2815 and pt - 816*7808. 

Working with (lxii) we have at once 

„» 16-3-*=12*5, 

whence F = 800 (1 + *03125 + *0013) = 826*0400. 

Thus pi = 7*2811 and p* = 816*7589. 

We see therefore that the two methods accord well, and further that p% is so 
large that (lix bis) wil' adequately express the distribution, where *e = 204*1897. 

Finally, by (lxi), the probability of the occurrence of the difference 

ft*-*!*--05 

is given by =» 1 — £$ 7 . 7 ^ (10*2095). 

And again = ZI*a,t** (8*2815, 8 2815). 

- Both involve a twofold interpolation with regard first to the arguments, and 
secondly to the orders of the functions. For the purpose of appreciating the pro¬ 
babilities the hyperbolic formula * will suffice. From Table I we find 

S,.78i* (102095) = *4920,0818 
and thus ® *01598. 

Again from Table II we have 

/. 222 , 22 a (8*2815, 8-2815) ® -008,324,230, 
and accordingly Q*i*l+i* =** *01665. 

Such values would only lead us to say that it is not veiy probable that 0 i'* and 

were samples from a population having zero contingency. Here, as so often, 
the difference test is, if only slightly, still more stringent than the ratio test. 

In order to determine this A priori from our diagram, Fig. 3, p. 308, we must first 
find v from the relation $ (t; - 2 ) =* 7*7815, orti« 17*563. The curve corresponding 
to this is nearly mid-way between the curves for v = 17 and 18. Corresponding to 
X* and x 2 we have €< Pi an ^ e£i' a , or 8*1676 and 28*5866. This point is just inside 
the v «18 curve and so clearly within the v * 17*563 curve; thus the difference teBt 
is the more stringent. 

(18) The previous section indicates that the solution of the problem of two 
mean square contingencies arising as samples from the same population is relatively 
easy, if that population has zero contingency. This follows from the fact that the 
approximate expressions for $i 8 and cr^t are known and relatively simple. In the 
case where the contingency is not zero the problem is much harder, as although 
approximate expressions are known for and they are laborious to determine 
in particular cases. 

* Tablet for Statistician* and Biometricians, Part IL Formula (a), p. xviii, i.e. 

*$x - ^00 + + fy*io +*X*n. 


(lxviii). 




334 Further Applications in Statistics of T m (x) Bessel Function 

Illustration (x). We will first illustrate the method in a particular example 
provided by Kondo and then consider what is needed in order that the values of 
fa 2 and o%,» might be obtained more readily. 

Kondo took 260 samples of size 200 from an infinite population of a 3 x 3 table 
with the following proportional frequencies*: 


*0831 

•0786 

•0270 

•1887 

•1032 

•2864 

•0862 

•4758 

•0335 

•1235 

•1785 

•3355 

•2198 

•4885 

•2917 

1-0000 


From this table Kondo computes 

fa 2 = -206,503, o\t = 0043,3796. 

Clearly p = X — 1 = 2, and thus 

fa'jp** -103,2515 and p-fa 2 = 1-793497. 

Using Equations (lviii) we find 

pi= 7-712,064, pt = 74-665,056, 

only differing slightly from Kondo’s valuesf. p* is considerable, and we take it that 
the distribution curve of fa 2 may be reasonably represented by (lix&is), or 

y = y n '\ (74-665,056 fay™’ 0 ** 

Accordingly the chance of a difference as great as or greater than fa' 2 — fa 2 is 

- 812,064 {£74*665,056 (fc' 2 -& 2 )}, 
and the chance of a ratio at least as great as 0i' a /<£i 2 is 

= _I (8*712,064, 8*712,064). 

Now Kondo has given the 250 values of ^i 2 which he obtained for his samples; 
the largest of these is *370,502 and the least *054,201 J. Let us inquire what is the 
chance of such a difference occurring in samples of 200 taken from the table above. 

The difference <f> i' 2 — </>i a = *316,301 and the ratio ^ 1 ,a /<^ 1 a = 6*835,704; hence we 
require 

£*118,064 (11 *808,316), 

and (8*712,064, 8*712,064). 

Using the hyperbolic interpolation formula (lxviii), we find 
£8 hi,o64 (11*808,316) = *496,3113, 

and (8*712,064, 8*712,064) = *000,111,08. 

Accordingly *007,3774 and *000,222,16. 


Biometrika , Vol. xxi. p. 411. 


t Biometrika, Vol. xxi. p. 419. 


t Ibid . p. 412. 




Kabl Pearson, S. A. Stouffer and F. N. David 


336 


Referring' to Fig. 3, p, 308, we have to enter with e<f> i* =27'66, m 405 and 
v «■ 18*424, or since this point lies well below even the v = 18 curve the ratio test 
is here the more stringent. With 250 samples of 200 we should expect by the 
difference test 1*8 cases with a fa* difference as great as or greater than *316. Rondo 
experimentally found one, but a second ‘313 runs it close. By the ratio test we 
should expect only *0555, say ‘06 of a case in 250 samples. Thus while the ratio test 
is the more stringent in this particular case, the difference test accords better with 
Kondo’s experience, and would determine more accurately the range of fa* in such 
an experiment. It may be remarked that neither Rondo’s maximum nor his minimum 
are outlying values; they run: 

At top -370,502, *367,532, 362,669, *343,805, 

and at bottom *054,201, *063,055, *080,840, *098,408. 

Thus the two methods could not be brought more into accordance by the omission 
of an exceptional outlyer. Assuming the arithmetic to be correct, and it has been 
carefully checked, this case seems to be of importance as an indication of the value 
of the difference test when subjected to experimental verification. 

The above illustration shows that there is no difficulty in applying the difference 
test to two values of fa*, if the parent population with a definite contingency be 
supposed known and the two samples have the same size and the like cell distribution, 
provided fa* and <r\* can be determined. When we have no real or hypothetical 
parent population, our only method is to suppose the parent population to be 
obtained from the combined samples which are used to give the relative frequencies. 


(19) The problem remains as to what can be done to simplify the labour of 
obtaining fa* and <r \in the case of contingency tables with a reasonably large size 
of sample. Now Professor Kondo has shown* that, to a second approximation, 


<h 33 <Pi + jy + jfn I 

i At -& + h _ (t 1 + 

** N + N 2 \N + N*) 


,(lxix), 


where fa, are functions of the relative frequencies of the parent population 

—i.e. of its cell frequencies on the basis of a total frequency of unity f—and fa L * is 
the mean squared contingency of the same population $. 

Now if we substitute these values in (lviii), the expressions for p% and p* become 
very complicated, even if we only retain the first two terms in the expansion. We 
get simple results if we retain only the leading term. In this case 

NifaKp-fc) 

pfl 


Pi' 


’-1 


* Pfi 


.(lxx). 


* Biometrjka , Vol. xxi. p. 418. 

t They are in faot the chances that an individual will be drawn from each particular cell. 

X 0i 2 is of course the mean of the mean squared contingency of samples and only approaches 
as N is increased. 





836 Further Applications in Statistics of T m ( x) Bessel Function 


These expressions do not contain fa, fa or /*, which would not therefore require 
calculation. How far can (lxx) be used instead of (lviii) and the fuller but still 
incomplete results (lxix)? This cannot be determined till far more experimental 
work has been undertaken on the two sets of approximations. We may place here the 
formulae from which fa* and /i are to be determined. Let be the chance that an 
individual will be drawn from the pth row gth column, c p . the chance that it may 
be drawn from anywhere in the pth row and c. q from anywhere in the qth column. 

q=\ 

Let Cpq ■■ c M l(c p . c. q ) and C p . = S Cp,, i.e. the sum of the Cpg for the pth row; in the 

P-K 

same manner let C. q = S Cpq or be the sum of C pq for the }th column. Then 

<£i 4 =£(CW-l .(lxxi), 


where 2 denotes a summation for the whole tables, and 

/i = 42 (^) - 8 * 3 “ (^A + 22 G ™ 0 J^ C "> .(lxxii). 

\Cpq) p-l \CpJ q^l\C.q/ Cpq 

A sample of the actual working required to obtain an f\ is provided on the following 
page, it is for the table of Kondo s on p. 334 of this paper. It is considerable, but 
not prohibitive for a table of 3 x 3 cells. We can now illustrate the approximate 
formulae (lxx) on this case where 

&* =-188,8925, /> = 2, = -804,0440. 


These give us 7*0370 and pa 53 76-0590, whereas when we use (lviii) with the 
fuller values given by Kondo for fa* and 0 * 4 * we have pi * 7*7121 and p 2 * 74*6651 *. 

We shall have = 38*0295, and accordingly 

1 - 2 ^. 6370 ( 12 * 0288 ). 

From Table I we find 

Sy-wro (12*0288) = *4971,7440 
and accordingly **00565. 

Thus in 250 samples we might expect 1*4 occasions on which a difference as 
great as or greater than that observed would arise. Accordingly we have actually 
approached nearer to the experimental result by using the less approximate values 
of fa*, and For such a table as we are dealing with, it is clear that Equations 
(lxx) will amply suffice for practical purposes. 


(20) We can still further extend the usefulness of our T m ( Y ) function and its 
probability integral to many cases where we wish to investigate the difference 
between two squared correlation ratios or two squared multiple correlation coeffi¬ 
cients. If these be vf and R*, those quantities can only vary between 0 and 1, and 
an appropriate curve for their distributions will be 


or 


y = yo(y*) Pi (i-v i ) Pi \ 

y-yoOR*)Ml--R*W 


.(lxxiii). 


* Kondo gives A 3 = *801,842, bat we have failed to find a slip in our arithmetic. This leads to 7*050, 
p,=76*271 and *00490. 





Karl Pearson, 8. A. Stouffer and F. N. David 

Actual Working of an f\. 


887 



Cn= *0881 

c l9 = *0786 

c ja ~-0270 

Cj.^-1887 


*0069,0561 

*0061,7796 

•0007,2900 

C t . msSiCm) *■ *246,7601 

c v c, q 

*0414,7626 

•0921,7995 

0550,4379 

<7V = 0608,9065 

Ou 

*166,4955 

067,0206 

013,2440 

/>2 

—=•322*6844 

c'u 

*0277,2075 

•0044,9176 

0001,7540 

Ci. 

C\ql c \q 

*333,5830 

2*003,5560 

067,1471 

•862,6794 

■006,4963 

•490,6186 

s t . =397,2264 


mm 

c*j2= *2864 

Cgg= *0862 

c s .a=*4758 

a 


*0820,2496 

*0074,3044 

^ = 5(C? Se )--608^788 

C2.C. q 

*1045,8084 

•2324,2830 

•1387,9086 

(7 V = *2583,4734 

c* 

•101,8374 

•352,9044 

•063,6370 

^•-•642,9747 

Cg. 

c\ 

*0103,7086 

1245,4152 

0028,6621 

C\/c * 

•100,4928 

•434,8517 

033,2507 

5 a . ("jp 8 ) = -668,6962 

Czq/Ciq 

*986,7965 

1*232,2081 

•621,0789 


<*3i = *0385 

c w =1285 

EB 

c 8 .=* 8855 

<?•*, 

•0011,2225 

0152,5225 

0318,6225 

C a .= -S (C?J,) = -433,8636 

C 3 . C. q 

•0737,4290 

*1638,9175 

0978,6535 

« 1882,2890 

c\ ! 

*015,2184 

0002,3160 

093,0629 

0086,6070 

•325,5723 

1059,9732 

ft 

—5-' — -561,0401 
c 3 . 

^ 2 Hql C Hq 

('3q/ c Zq 

•006,9134 

•454,2806 

070,1271 

•753,5457 

•593,8226 

1-823,9345 

S 3 .(^')= 670,8631 

C.q 

■ 

<j #1 =s *2198 

c.o = *4885 

c. 8 = *2917 

c.. = 1*0000 

c.„ 

•283,5513 

•512,9879 

•392,3533 

fS(C TO )-1-188,8926 
(thus $*= 188,8926 





(&, f = 1-426,6992 

c\ 

•0804,0134 

•2631,5659 

•1539,4111 

J V c p* ' 

[S, (~T~^ “ 1 "432,2344 

£hs. 

c.„ 

•366,7932 

•538,7034 

•627,7378 

s(y“)=#i.+S l .+£«. 





” -1-636,6847 

P s(~e>C p .) 

P s (°f c p .). 




P=l \ c p q / 

\ °j»i / 

p-1 \Cpt J 

p-iVCpj P J 

\ °pq / 


* 1-1930,5669 

-1*1636,4101 

= 1*2280,4222 

-1*417,0530 

Same x C. q 

*338,2928 

•596,9338 

*481,8264 j 



Finally, by (lxxii), 

fx =4 x 1-636,6847 - 8 x 1-426,6992 - 3 x 1-432,2344 + 2 x 1-417,0530 
-•804,0440. 





















388 Further Applications in Statistics of T m (x) Bessel Function 


Curves of this form are known to be accurate when we are sampling from a popula¬ 
tion, wherein there is no correlation of character and the distribution is of normal type. 
Equations (lviii) to find pi and p% will then apply and we may write them 

y. + 1 - B*(—- l), y, +1 = (1 - S’) -1) ...(Ixriv). 

The values of p x and pa can be determined if the mean and variance of 17 ® or i2® be 
known. 

If either p x or pa be large, say p 2 , we are thrown back on the Type III curve 

y=yo \ n ^ 

or y = y 0 ' (* 2p t 22 *)*» r*H »] . V 

and accordingly the distribution of the difference of the 77 * of two samples, or the 
22 ®, will be given by 

V = \ MT PV ^ [}h (y ' 2 - y 2 )} [ n m i\ 

or y = lMT Pl+i \MR' 2 -R 2 )} J . ( h 

and Table I can be applied, or 

= 1 — BSpvfj {P*(V*-V ‘ B )] \ (Wvvii) 

and - 1 - 2 ^ 1+i {p a (22'®~ R 2 )}] .^ 

It is needless to add that the ratio test can be also used in such cases, or 

1 (pi + l>jpi+l)) 

1+VW . .(lxxviii). 

and Qr*i& *2/ 1 (pi 4* 1, pi + 1) [ 

lT 

Here as before we need apparently only p x to find Q , but we have actually to find 
pa, or we have no justification for replacing (lxxiii) by (lxxv) 

Illustration. In the special case of a normal population with uncorrelated 
characters, we know that 

pi=£(rt- 8 ), pa = i(^-”-- 2 ) .(lxxix), 

where for 77 * N is the size of the sample and n the number of arrays on which 
17 s is based, while for 12®, N is again the size of the sample and n * total number of 
variates considered, i.e. the dependent variate and n — 1 variates with which it is 
multiply correlated. 

Now it is obvious that in a very large number of cases N will be large, often 
very large as compared with n. In such cases (lxxv) will apply, and we may write 

iV-tf -1 - 2S i(n _» {*(#-»- 2) (12'* -12*)}}. UXXX >' 

and for the case of the ratio, 

2 / , ii(n-i),|(»-i)}\ 


Qk'Ii* * 2 / ! (n- 1 ), £(»- 1 )] 


(lxxx bis). 


l + R'*IR* 









Karl Pearson, S. A. Stouffer and F. N. David 


389 


The length whioh this paper has already reached hinders our providing special 
numerical examples for this section, but they would only be similar in type to 
those of the earlier sections, and accordingly little harm will be done by their omis¬ 


sion. 


( 21 ) A limitation which detracts somewhat from a hill use of the present 
methods must have struck the reader. Namely, we may need to compare the constants 
of samples which are not of the same size. 

In this case our two original equations are of the form 
My 1 T 1 +1 , Mr yt r * +1 

y “ “ T(Ti+ij “ Tl<rn 

and the combined fiL«quency surface is given by 


w 


= M M x 5 “ r ^^T^T) <rlT>u+y,,) u ' lv ' % [dudv ^ -( lxxxi ). 


Let us first approach this from the standpoint of the ratio of v to u y and write z = v/u. 
Then (Ixxxi) becomes 


w : 


Mr Vi r l + 1 ry » T *‘+* 1 


Write £"*(yi + ytz)u and we have 




My i T i +l 7 $ r * +1 




z T % 


[d£dz]. 


r (n +1) T ( Tl +1) * (71 + 7**) Tl+r * +a 

Keeping z constant integrate out for £, the limits of which will be from 0 to oo 
corresponding to u from 0 to oo . Thus we find 

My i T i +1 7g r,+l sT t 


w ■ 


[dz] .(lxxxii) 


B (ti +1, T| +1) (71 + 7 a x) T i+ r i + 2 
for the distribution curves of the ratio. 

Now put z'^zyt/yi and integrate for the probability P\ of a ratio greater 
than z 0i 

Z% ^ Z .(lxxxiii). 


7i TjL+1 7* T,+1 


pt _ n * 7 «‘ f°° _ 2 


B (T! + 1, T* + 1) J * (7! + 7*«) r i +r l + * 

1 1 

Now put - = l + z?*, and hence dz =*— — -sdy; further when z — oo, y «*0, and 
r V 7i 7*r 3 3 


when z = z 0 ,y = 


1 + 


£07* 

7i 


■. Thus we reach 


P\ 


i_ 

g o 7s 


* B(ri + 

I 


1 fi+^a 

WT)J 71 yi 


(1 -y) T *dy 


l (r, + l, t, + 1 ) .,,.(Ixxxiv), 


l+4» 

Yl 

where / is the function tabled in the Incomplete B-function Table now at press. 






340 Further Applications in Statistics of T m (x) Bessel Function 


Just as earlier (see p. 306) we should add to this the other equal wedge for 
U/V > Zq, or z taking values from 0 to —. Calling this we have, from 

Zq 

(lxxxiii), 

1 

pn 7l ri+1 72 T|41 f g o zT% & z 

H “ B (n + 1 , T* + 1 ) Jo (71 + 7 * z) r 1 +T 1 4 ® ’ 


Now transform our integral by taking y 
y — wl— , and the integral becomes 

1 + ** 

7* 


7i + * 7* ’ 


P 


n _ 


B (ti 4* 1, r* 4-1) 


i , x ojyi , . 

72 y r, (l -'VY'dy 
. o 


or limits will be y ®0 and 


= / 1 (t 2 4- 1 ,ti+ 1) .(lxxxv). 

1 + V7i 

7s 

Combining (lxxxiv) and (lxxxv), we reach finally 

Qzp- I l (ti-H, 72 4- 1) 4- / l (t 2 4-1, t x 4-1) .(lxxxvi), 

l+!t.72 1+ 5t?l 

7i 7a 


a result involving a double entry into the B-function Table. 

Thus, if we use the ratio test, (lxxxvi) shows that the determination of Q^ 
involves very little more trouble than in the cases we have already dealt with when 
7 ! = 72 and Ti = r*. 


Illustration . Suppose we have two samples rendering variances with values 
(Ti* and <r 2 8 , but that the size of one sample is n x and of the other n 2 , and we wish 
to test whether they may be considered as drawn from the same parent population 
of variance 2 2 . Then, by (xxviii), # 

tj = i(«i — 3), T, = $ (n, — 3), 71 ” 2^8 > 7*=2f*‘ 

Accordingly 


Q'tW = I 1 (I («i -1). Unt-l)}+I 

1 + ^ 


1 + 




(i(«a - 1 ), i («i - 1 )}.. .(lxxx vii). 


A question may arise as to whether the sum of the two /-functions may not be 
greater than unity and so > 1 and impossible. Clearly the denominators of 
both B-function ratios are equal, since 

B {*(«!- 1), *(«,- 1)} - B {Km - 1) hint- 1)). 

We can therefore write in the form 


Q, 




[ <zi ln i~ 8) (1 — dw+ f xi ,n *~ Zl (1 — dx 

.'o Jo 


B {J- («i — 3), J (n, — 3)} 





Karl Pearson, 8. A. Stouffer and F. N. David 


341 


where 




and X.' «= 


i+sJb' 

(r£n % 


Write x «* 1 — x 9 in the second integral, and we have 

j' ai (n i- 3i (1 — dx + j ^ 4 (1 — a?')J (n »'~ a) da/ 

" “ B{i(n x -3)!i(«»-3)} ' 

Now the numerator will certainly be less than the denominator if \< 1 —V, 
for then the two integrals do not make up the complete B-function. Hence our 
condition is that 

* 5L 1 

1 . 

<T\* Til CTf Kt 


or that 



This is a condition always satisfied, since we have started by supposing <r 2 2 > <r x a 
and asked how much greater that ratio may be than unity, without being too im¬ 
probable for both samples to have a common source. 

We now turn to the problem of the difference v — u. Taking Equation (lxxxi), 
let us put 

# u + v = X, v -u*=Y, 

then the surface becomes 


wm w 0 f e-i'rt-yi're-i'yiWXiX- Y) r i(X + Y) r * [dX dY]. 

Now we desire to integrate this with regard to X from Y to oo , keeping Y constant, 
for the portion of our surface, precisely as in § (3), p. 295. Now write X—Yt and 
the limits of t will be 1 to oo; thus the frequency surface for Y 3 which gives the 
difference, is 

z = w 0 " to-vi ) Y F r i+ r i +1 j e~i (y i + y* ]Yt (t — 1 ) r i ( t + l) r i dt .. .(lxxxviii). 

This curve is allied to the T m (x ) Bessel Function curve and passes into it when 
n = t 2 = r, but as far as we are aware has not yet been studied. Its consideration 
is left for another occasion. If n * r* » r and we write 

Y' = i (71 + 7a) P ® and m=r + \, 

7a+ 71 

we have, by the proper choice of w 0 ", 

z = M( 1 - e- pr T m (F') [dY'] .(lxxxviii bis), 

T m (Y') not changing sign with Y', and when we put Y' negative we get the other 
section of the area we need to integrate to get rid of X as in §(3), p. 295. This curve 
has been fully discussed in Biometrika , Vol. XXI. pp. 168—187. Its area when 
integrated for Y 9 between — 00 and + 00 equals if, and we have its ordinates tabled *. 

* See Biometrika , Vol. xxi. pp. 195—201, or Table$ for {Statisticians and Biometricians, Part U, 
pp. lxxix—lxxxviii and pp. 188 — 144. 

Biometrika xxiv 22 




342 Further Applications in Statistics of T m ( x) Bessel Function 


In terms of the difference F, 

(H 7 i + 7 *)F) [dF] ...(lxxxix). 

This curve for practical purposes may be replaced by a Pearson curve of the same 
first four moments after r «11 *. 


Illustration . It may be asked for what type of variates would it be adequate 
to have ti=* t*? We reply at once for rj 2 and 7J 2 , when the sizes of the samples are 
considerable as compared with the number of arrays in rf or the number of variates 
involved in R\ This is the case where we are justified in using Equations (Ixxv), 
and consequently (lxxxix). If n be the number of arrays or the total number of 
variates, and N, N ' be the sizes of the samples, then we have 
• n = r% » T * £ (n - 3), 

while 71 “ i (A r — 7? — 2 ), 7 a * £ — n — 2 ), 


and the curve of distribution of, say, rj' 2 
. ^.i/ {(iy ~ w ~ 2)(iyr, ~ w ~ 2)1 * ln ~ 1) - 


{i (N + N') — n — 2} n ~ 


— 7 j* will be given by 

-i{*(tf 2 ?i—4)F)[dF], 


where F = V a — .(xc), 

and we assume the parent population to be without association in its characters. 

There is small difficulty in plotting this curve from Dr Elderton's Table of 
ordinates, and at present the planimeter or a quadrature formula must be applied 
to determine the requisite areas beyond the values ± (rj' 2 — rf) observed. An im¬ 
portant point arises from both the Equations (lxxxviii) and (lxxxix), namely that 
when the sizes of the samples are unequal, then the difference of two statistical 
measures does not give a symmetrical curve, but one which like that of the distri¬ 
bution of the first product-moment coefficient may be notably skew. The fuller 
discussion of these curves must, however, be left to a further paper. It may be 
asked: Why, if the ratio-test gives an adequately simple expression for the proba¬ 
bility, should we deal further with the more complicated expressions for the 
probability of the difference? The answer, we think, is that the results of the two 
tests are often not so closely in accord that we can be confident one may not for a 
particular case be more correct than the other. The probability deduced leads at 
once to a frequency, and the touchstone of the more correct test is that it should 
give this frequency the more exactly and more often. The only way of determining 
this is the experimental one. This would not be hard in the case of the range of 
based on pairs of samples taken from a uni-variate population. It would, how¬ 
ever, be far more laborious in the case of sample contingency tables taken from a 
bi-variate table. Still it is to be hoped that such work will eventually be undertaken. 


( 22 ) General Conclusions . 

The main purpose of the present paper has been to discuss an alternative to the 
ratio method in dealing with a number of statistical coefficients, such as £*, 
rj\ R 2 , <f> 2 t which on certain hypotheses as to the parent population obey accurately 

* loe, eit . p. 181. 




Karl Pearson, S. A. Stouffer and F. N. David 


348 


or approximately the Type III form of distribution. We have seen that the difference 
of two like statistical coefficients follows in its distribution the T m ( Y) Bessel-Function 
curve, or in certain cases the more general skew form er* 7 T m (Y), We have provided 
a table (Table I) of the probability integral of the former curve, by aid of which the 
probability of a given difference can be rapidly found. We have postponed discussion 
of a still more general curve, i.e. (lxxxviii), to a later paper. 

A number of suitable illustrations were chosen by one of our number at random; 
that is to say without knowledge of what would flow from them, and the difference 
and the ratio methods both applied. The results show that in the great majority 
of cases the difference test is more “ stringent ” than the ratio test By this we 
merely understand t^^) the probability of the observed result is less by the former 
test. Under such conditions, however, it is reasonable to suppose that preference 
ought to be given to it. At any rate it justifies the use of the difference test along¬ 
side the ratio test. In order to simplify the application of the latter test, it has 
been approached by a now method and a table (Table II) is provided by which the 
probability on the ratio test can be at once determined. We have further prepared 
a diagram by aid of which it can be rapidly ascertained which of th’o tests will be 
"found the more stringent in a particular case. 

In the course of our work we have pointed out difficulties which occur in using 
the x 2 methods, and warned the student of difficulties which may arise if the two 
series from which % 2 is obtained are of very different sizes in the case when their 
relative sizes are wholly arbitrary; we indeed question whether, when the relative 
sizes are not “naturally” fixed, we ought to use at all*. In the case of the 
comparison of two ^ a ’s we have with some diffidence suggested a possible method 
of overcoming some of the difficulty. The reader will see that much yet remains to 
be done—especially experimental spade-work—to obtain satisfactory tests for either 
the difference or ratio of these statistical coefficients in the case of different sized 
samples, especially small ones, when the parent population is unknown. 


' # There ia another point about the usual x 3 distribution frequently overlooked. Given two aeries of 
v categories each and Bizes N and N ', then the maximum possible value of x 3 is N+N'. Accordingly the 
distribution curve is limited, and should take some such form as 


y“ f/o (ix 8 )” 1 (N+N'~ *»)*• .(e,) 

rather than «'**’ .(*»)• 

If N+N' be large, and then only, approach will be made to 


but in our case X = 2 and k=v. Thus («j) becomes 




x*(iV+iV')-l 


whioh takes the form of (c 8 ) if N+N' be very large. The Type III curve is therefore no more exact in 
the oase of x* than it is in the case of q 9 or JR 8 , it depehds upon the large size of N+N*'. This point is 
frequently overlooked, but it is aotually involved when we replace the binomial distributions by normal 
eurves in our deduction of (e t ). 


22—2 





TABLE I. Values of the Probability Integral of the T m (x) Curve, m = 0 to 115, x =0 0 to 18 0 , for determining the Probability 

of a Difference, e.g. |x'* ~ £ X*- Here v = n = 2m + 2, or m = £ (» - 2). 


344 Further Applications in Statistics of T m ( x) Bessel Function 


rH U5 

II II 
e S 


rH «5 

II H 

« s 


K g 


II II 
« E 


03 CO 

I! II 
e g 


© 

00 CO 

II II 

* S 


C- <M 

II II 
C g 


© 
CO 04 

II II 
e S 


II II 

« 6 


II II 
* E 


666666 


© Op © p 2 p s* < f ) ,9 5 ^ 

6 6 6 6 H HHHHH H H HH (N 


rH 04 co h* »© «p^-® wo 

04 04 04 (M CM 04 04 04 04 6 


> 04 iH CO CO © Q 05 tJ- S CO 

) 6 oi o n ^ ^inaoOH 

> CO 40 OO O 04 CO CO 04 ^ ac 


4 23 

!S3 


> 05 ih CO «5 


< «o fe t* 
IS 

i 05 < 


s^g 22 22225 


> ao r- © 
* I© © L-*« 
I 04 04 04 


3 CO i© © 
r .1 .h Op —* 
) CD H 


■ 00 t- 04 < 
05 *—i CD CO < 
l- l"t O ' 


)H»CN Tt* 05 ( 

i oo <d hh( 

)MbO HD’ 


IS 


> CO - 

!SS 


ioilsi sliaS S§§S? §§§£$ ippsp f|s|| 


) 05 00 00 © CD 
) CD M fN QO »D .. 
> CD CO 05 -t 05 CO 


4- 

»© w P © r - 

WOO«ift 
» 1' N »© 


p'||l§ si'sSS psspp fspip gsg?f3* 


© H< r- pH Hij-O 


R 


05 

© 00 © 04 Tt* 


04 04 00 H* c® »H H* 03 

828< 


CO »© CD ^ 04 O rj 
l'- CO H H W W 
or*. CD 04 o o CO 


(NOOOCCD CD t» D H lO 

>*ss«3s T®g© 5 Sg£? ! 


) O O O 05 04 -t* Tt* C 


3 © 04 05 

> 05 r—t 04 CO 

> CO 00 05 I— 


8 


ao CD O CD 05 CD »h r* 00 

O CD 6 04 54 H fl> lO H 

H (54 30 6 D l£? !>• I". *f 


* 03 CO l"- 04 CD O H 00 H 4 I- O CO *C t- 05 © r-< 04 COCOC0MO4 O <35 £; »© CO 

!§S8S gSSSS 22228 SSSSS S8838 


> ^ H rH 
) H oao 
) -« 05 04 


04 00 Q CD CO 
04 1— CO CD I s — 
« O >ra cd *f 


5)OHCD( 
P pH CO Tt* < 

© O O O < 


DQpHWr^ 84 t *15 CO D 
M S (54 94 CD C® CO I— -f* »f3 
05 O 00 04 <54 05 04 r-1 l"- 05 

> 1- CD 

► r-H 04 


CM 04 >® CD CO 
rH H CD t*» »© 
00 CO Tt* 04 1- 


HHHR4 54 (54 04 (M (M CM 04 ©1 « CO p C® 


*D *«t* 04 < 


'ffil-OOJ't 

)©a>'<riDH 

> 05 ao I- ** O 

) CD « o t. f 
) H « O CD ao 

> © © o p o 


O O IQ 1* 81 ® rf ri H IJ4 »© H* H* 04 00 

6«t WCDHCDO 04 04 05 CO CO 

CO Tt* CO 05 iH © i© t- Tt* 56 r- N 54 05 H 

g CD 04 


CO rH *5 CM 05 
H CD 00 05 OO 
05 04 r-t CD I- 


O CO CD 03 r-< m -f 

04 CO -3* iD I- CC 03 
04 04 04 04 04 04 04 


> CO CO CO CO 


888S8 

CO CO CO CO co 


i r- h 05 w f 


I + 

3 i© i© CD CC 04 C 
!> GO © CD H" CO ( 
> r- r-* 05 CO 04 c 


) »© Q H h» W *t 

> CD Oh®(Oh 

> -t* 56 CD 00 CD 05 


) CO Ol -? H* »£ w- w 
s^CD^CD COI^GOOOI 


H O *© 1— 
i-4 05 CD CO < 


>00l-»f5^O4 O 1- iD H 00 CO 05 'H* CC 04 iO 00 O W SO ^ U 3 ^ P* P SP SS 

> r-» CC »© I— 05 H (N *f CO /■* 05 O 04 CO iD CD f- D O H (M « H iD D 1-1-0005 

>06000 r-< r-< r-4 M H S 04 04 0-4 (M CM (M 04 CO CO COCOCOCOCO COCOCOCO 


> CO (M 05 i(5 04 D O CD 1- CO O' 
>00050-H M 03 ffl (M -f ' 
• 04 CO r—< CC O QOHXOCD iO( 


* i(5 OD CO f— 00 L— OO 00 CC O CM 

lOWO p ID CD t H 03 ft H O SP 

) rf W CD (>l H O H (N CD i(3 ft t« O 


+ 

O O CC Q 
I- 04 pH 05 P 
00 04 H iO CD 


>h(N«SOH COCOiHQr- -tOCOHiO 05<M-^*CDr- 

> (M O CO O 04 -t CO ao 03 H CO H* CD t— XOhJJM J 5 5 £ X « SC 

> 6 o O O •—< rH p —4 rH r—< rH CM040404(M 04 CO CO CO CO CO CO CO CO CO CO W "T H H 


l- t - CO ‘D Tt* ph c 

cS S t 


I 


aOQopaO (M o O i‘ O 

i-pt-coaS 

CD O ao 05 cm ao 04 o 

, 05 -rt* I- Q CO **V3 uo 

(M r- 05 04 'H CD 00 6 04 

O O O O •—I HHH«(N 


:s 


_ CO ao < 

04 CD 05 CD 05 ^ C 
O O »H CO L- (M O < 


3 iC5 O ' 
3 I- 05 C 


(M CM (M <M t 


ac ph 04 -* ~t 
fH CO jr »(5 ZD 
CO CO CO CO CO 


CM I - CM O Hf 
ao »o cm 1—1 10 
HP CD (N CM CD 
^ CO CM ON 
CO §8 S 5 S 


o co ao 1 - CD 

ao CO 04 05 CD 
»D O O O t- 


rC H I- 81l» 
H 04 04 CO CO 
-rf Tt* 'PT 


eo o 

II II 

* g 


04 © 

II II 

* g 


> CD iO CO GO 05 
) H 04 04 © » - 


>iO©apf-- 
) Tt* TH HP op 


+ f 

CO KO qp 05 O 

ao © O »o 04 
a 04 04 © ao 


:S8i 

> 04 05 < 


> 04 r- 

>88 


> H H 05 t— CO 00 H 


. ... , - _ _ ., . ^ . . . . — .. _r—L— CD lOCOOCOCO gp CO OQ CO 

ncbafMrf t— 05 04 Tt* CD I. ft H 84 rf* »0©t-g0 05 Q H § 84 W 

OOOHH ph h 04 04 04 04 04 CO CO CO COCOCOCOCO ^Tt*Tt*Tt*Tf -^Tt*T**’H»Tj* 


1 I 

„ . 4 kO rH © ID 

) 00 W ft H P3 
5 i(5 CD i(5 00 L'* 
> © © Tt* CC 

^ g g 04 cd o; 


Tt* t- 

05 © 
gt'psi 
ID r-i 1(5 < 
04 »ft t- < 
04 04 04 ( 




cd o eo 

IC5 H* 


I 

04 1(5 
© CO 


co © co © ao 
co Tt* © 1 - —' 
-5 so (- 


I CO CO CO CO CO 


o cS 


04 © pH © 04 
18 iC iO rt f? 
© © CO 04 CO 

as go t- »o 04 

ft © H 184 W 
CO H* H* H* Tt* 


04 go rH r-* ap 
it. as r- Hr 2 
1 - 1 © ao © 05 
ao Tt* © tj* ao 

£22222 

” 


P I'* kO < 


!8 


04 © © 04 i© 
CD © © L- Lt. 

Tt* H* H* Tt* -f* 


> Tt* CO 04 I© 

> pH 

> as 


CO 04 1 © © O © h © C 
i©g*©© O^gpGpC 
H *T I* H iH 05 CO © » 

GO t#* CO 04 l© 04 Tt* Tf rH m r-©©©CO ©i©©i©© 04»©©©04 OfcSO 

HH M SS W CO cS S S 5 Tt* H* 5 3 Tt* 2 ^ ^ T? ^ S 58 28 3 ^ T? Tt* 


3 00 CO © < 


Gp©*©©!- 
© H ft |r J> 
TJ* O rH i© © 


GO 04 (M ( 
Tp pH os < 
I© 04 © < 


p »h 04 co y 1© © r- gc 05 © h 04 ep -^* »© © l- ao © © h«w^« © r- ao © p 

© © 6 © © 6 6 6 6 6 |H HHHHH rH rH fH rH 04 04 04 04 04 04 04 04 04 04 CQ 













Karl Pearson, 8. A. Stouster and P. N. David 


345 


pH ©q C© P* l© 
CO CO CQ CO w 

CO 00 C5 9 

TO TO TO TO ^ 

»© <p i© O «TO 

4* 10 »© <© 00 

© »© © 0 © 
I*p. t- 00 00 05 

ip 9 9 9 *P 

OOOhh 

HHrtH 

»£5 <p »© ^ 

©q ©q co co pf 
hhhhh 

99999 

pt 1 © k© QD CO 

HHHHH 

9<PP 

1 -® 

H H H 

8fS$S;3 

sg§$s 

mtpfsmSS 

S «IS «8 3 

p S CO So P 

+ L 

«6^cj^ 

2SS33 

ffbpf 

|S3$S| 

o5 TO © ^ O 
CO ©q CD Q CO 

33339 

SSS33 

05 ro co i-< co 

2$3Sg 

O TO O TO l 

mu 

|l||l 

|11 

*G CO © l> •© 
3 cS S C§ « 

sslss 

O <0 H TO 

. CD M 8 8 3 

Si»fe$f 

h-. p* r- t- 0 $ 

ih co ph co 01 

3 9 3 33 

A© co cp p* 
2? 12 2? S S 

w w W w ^ 

$3393 

TO <N N N 
WXOflSH 
OhOCO 
cp r> op 00 05 

33339 

23391 

CO t© CO r- 06 
Q5 g5 <25 05 05 

33933 

iiili 

33333 

p* pq4 Pt P*4 


1 

00 »© p* h- l 05 

SooSfeo 

!"» lO rH 00 «fl 
p* »© CO CO !"• 
CO CO CO CO CO 

5 CO C? 05 8 

« c5 co c5 p* 

CO CO CO CO <M 
HI-WQX 

p* -f i© C© p? 

CO pj 00 05 

S i S § u 

HOOtjlO 

TO 00 m TO i<5 
00 Op 05 05 05 

PC P* Pf 

05 00 t© 05 ©q 
CO t' 00 op © 
05 05 05 05 05 
p^ ^ pf 

2S933 

TO © N 00 X 
05 05 Qi Q) 05 
05 05 05 05 05 
P5t* pp^* pTOM PPt* P=f< 

00 -H |H- 00 t 

©q 0 50 tp- 35 

05 05 05 05 05 
05 ^ ^ 05 » 
Gg 05 05 05 05 

7* 7 7* 7* 7* 

s ® a 
333 

a co co oo 

N rf (N ph 

In. oo CO p- CO 
O N ^ ^ N 
w CO l— 00 00 
««« WTO 

ssssg 

©q 05 pf »© »© 
CO OO -H 05 p* 

33333 

H 05 O pC 05 
K5 H O iO 

05 »© <-h CD i’* 

•PWtO-tPH 

3333$ 

1 

50 !=! £ 22 7 

H O TO 05 QO 
05 X- pf 00 £p- 

iliil 

r- t- 05 co ^ 
© 10 TO Ip 6 
1- pf* 05 ©1 *0 

CO ©q » O Q 
CO I'p* ^ 05 CO 
© 00 00 05 

33333 

7^ ? 7** 7* 7* 

c© 05 05 CD 

05 05 05 05 05 
05 05 05 05 05 

33333 

Tt< CO t- 
© © 05 

05 05 05 

333 

7 7 7 

•374 904 
•381 735+ 
•388 257 
*.394 477 
•400 404 

I- l- rH -Pf 

itHMI-N 
O p* *0 00 05 
CO rH CD p- 4 i© 
Ohh(N(N 
•t pf *f pf 

1 + 

|p- »o 00 »o CO 
W QO 10 0 « 
co ^ »o t- CO 

to Q H 05 10 
p+ CO fp- Ip- op 

y ? y# y y 

co 0 co p^ ©q 
0 0 0 ©q ao 

05 05 r-H CO 50 

05 ©q »o $ fp. 

$3333 

23S*ol 

p*< os ©q «o cO 

00 CO 05 05 C» 

33333 

X (N H pC Ip 
t' TO O TO •© 
I- QO 05 05 05 
05 05 05 05 05 
05 05 05 05 05 
PC pf PC pf Pf 

”3S8S§ 

05 05 05 05 05 
05 CD 05 05 05 
05 05 05 05 05 

7 7 7 7 7 

TOX© 

05 05 05 

05 05 05 

333 

7 7 7 

•390 243 
-.396 727 
•402 883 
•408 720 
•414 252 

05 U5 © I— O 
X ^ TO iO TO 

05 -C 05 CO ] - 
pH N (M TO TO 
-t -r -c -t< 

00 05 ©q CD CO 

05 ©q ©q co 0 
©q ©q co to r-H 

O 00 t- -^ © 

§ 2 £ 2 2 
^F 

00 CO Pt op cp 

©q oo r-i 55 q 

»© 00 1© CO p? 

cq *t x x 

05 05 05 05 05 
rf pf* pt 

© «t <N TO© 
«q oo ©q co 05 
05 ©q 10 co t> 

op 05 ^ 05 05 
Oj Q Oj Oj w> 
-f ^ -rf -t 

rtOOrHPC 
TOppCTOt- 
00 05 05 05 05 

33333 

pf PJ<PC PC PC 

CO 05 CO t© !>• 
Op 00 ® ® 05 
05 05 05 05 05 
05 05 ® © 05 
05 05 05 05 05 

7 7 7 7 7 

GO 05 05 

A 05 05 

05 05 05 

« 3? °> 

05 05 05 

7 7 7 

co a a ©q co 

1 CO pH ■pf ifl 
I"» l- p* t'p 1- 

8 22 22 ©q 8 
-t* t? p* -r 

NXQNW 
CO 05 55 CD CO 
p* CO © 05 CD 

£2 £ 22 i2 2 
S? 5? 4 3 :* 

l. 

pf iO 0 ® pf 
CD h* O TO fN 
1 - 10 w 0 cq 
p* 0 co 50 ©q 

33333 

In S S TO 0 

lp» CO ■pC 05 

Pf TO 1» X ® 

? f 9 $ 9 

oo ©q cp op 1 © 

r-HlOOQI- 

CO *© 00 00 

05 ® 05 00 05 
05 05 05 05 05 
pt rf ^ * 1 » 

05 I"- CO 00 CO 
pp pCTOI'X 
05 05 CD 05 05 
05 05 05 05 05 

33333 

05 S 8 8 § 

05 05 05 05 05 

2? 2? ® 2? 

05 05 05 05 05 

7 7 7 7 7 

05 05 Q 

05 CD Q 

05 05 O 

05 05 O 

05 05 Q 

7 7 25 

•424 346 
•429 668 
•434 646 
•439 299 
•443 646 

8328$ 

1 ^ *r O TO TO 
NHlOXH 
p* to i© i© CO 
t T* 

I-p- ©q 05 «h *Q 
1 OH 1 ONOO 
S TO H |H 

C© ©q op ©q pf 

I "- 00 00 CD 05 
p* pC ^ p* 

1 

1 © N X H X 
CO -* ©q p^ tH. 

55 r— »C 5 0 co 

8 05 S 8 85 
pj< pf pji rf pji 

Ip Q TO <M H 
05 C? 05 CO 

1 © In. 00 00 05 

05 Oi 05 05 05 
05 05 05 05 05 
pj< pf p<t< p$< Pi 4 

co ©q ©q 05 c© 

Ol-XX© 
05 05 05 05 05 

05 05 05 05 05 
05 05 05 05 05 
pCpCpCpC^C 

+ 

0 N x © a 
05 05 05 05 05 
05 05 05 05 05 
05 W OD 05 QZ 
05 05 05 05 05 

7 7 7 7’* 

III 

•442 562 
•447 009 
•451 130 
•454 946 
•458 479 

CD ©q c© i - 

pf ©CD ^ aq 
l- It- iO *-h »© 

to to CD 1 h tH 
*fpf 

+ 

iiili 

MCOtNiOtp 
Op X 05 05 05 

T 7 7** 7 7 

X TO h 0 h 
CD CO CO CD 
05 CO r-H CO 

§1111 

2 SSSS 

33333 

Pf Pf T}< Pf PC 

XTOWTON 
l’* Op 05 05 05 
CD 05 05 05 05 
05 05 05 05 05 

33333 

3333 ! 

05 05 05 05 O 
05 05 05 05 O 
05 05 05 CD O 
PC PC pC PC »o 

i 

*460 680 
•464 048 
•467 134 
•469 962 
•472 552 

CO "pt 1 hh 05 r-H 

2 ; © op a co 

05 o o oo *o 
pf h»OOM 
!>• fp- 00 00 

7 7 7 7 7 

S § «S *© op 

05 O CO ©q ©S 

Op ff© ift t- ao 

00 05 05 05 05 
pf <<f pf rf r)< 

£§§333 

05 CO »CS 00 

X © © © © 
05 05 05 05 05 

pf p* pf Pt p* 

8§i£§3 
© © © © © 

33333 

Hf Tt< Pf ^ Pf 

iliii 

05 05 05 05 05 
05 05 05 05 05 
PC PC PC PC PC 

05 05 O O O 

33SSS 

if 

•477 475- 
•479 619 
•481 558 
•483 313 
•484 901 

0g Cjjj) i© 05 ©jj 

CD H l- H X 

less I 

28S33 

pt pf ^ pf 

urn 

1 

co co ©q 1 © 

I &333 

05 <X © 05 05 
05 05 05 05 05 
•pf if PC ^C 

iliii 

HIH 

iliii 

i 

issss 

i-» 23 m co ^ 

$§933 

r- co ppf Tf o 

SSS3!5 

iiili 

<jq As N Q TO 
«5«g® 

fh 05 eo co ip» 

mil 

iliii 

I-I A© tH. cp 05 

33333 

Hill 

iliii 

Iliii 

III 


t;- oo 95 9 

10 0 *p ^ ip 

9^909 

©9©9© 

<p »© 9> 1© 0 

»© 9 »© <p 0 

999 

co co co co co 

co co CS CO 

4j* IO »© CO &> 

l>» GO OO 05 

05 O O 1—4 T—1 

©q ©q c© w 

pt©©totb 

fp» ip* GO 





r* f«1 H i»4 

HHHHH 

HHHHH 

pH *«< i-H 



























TABLE I {continued). 


346 Further Applications in Statistics of T m (x) Bessel Function 




















Karl Pearson, S. A.- Stouffer and F. N. David 347 

TABLE II. Valuta of I x [J (v— 1), — 1)}, »«2 to 25 and x = '01 to '50, for determining 

the Probability of Ratios, e.g. x'Vx*- 

j?«*01~«50 2—7 


X 

t>=2 

i(v-l)aOfi 

8 

1 

4 

1-fi 

5 

2 

6 

2*5 

7 

8 

X 

*01 

•063 768 56 

•010 000 00 

•001 693 65+ 

•000 298 00 

•000 053 74 

•000 009 86+ 

•01 

H 

•090 334 47 

•020 000 00 

•004 772 77 

•001 184 00 

•000 300 76" 

•000 077 62 

*02 

•03 

•110 824 69 

•030 000 00 

•008 741 44 

•002 646 00 

•000 819 78 

*000 258 00 

•03 

*04 

•128 188 43 

*040 000 00 

*013 4}7 07 

•004 672 00 

•001 664 48 

•000 602 21 

•04 

•05 

•143 666 29 

*050 000IX) 

•018 693 04 

•007 250 00 

•002 875 76 

•001 158 12 

•05 

| " 

•167 542 43 

•060 POO 00 

■024 496 26 

•010 368 00 

•004 486 12 

•001 970 27 

•06 

•07 

•170 463 43 

6 

o 

£ 

8 

•030 772 22 

•014 014 00 

*006 521 83 

•003 079 93 

•07 


*182 554 89 

•080 000 00 

•037 477 97 

*018 176 00 

•009 004 21 

*004 625 26 

•08 


•193 973 37 

•090 000 00 

•044 678 42 

•022 842 00 

*011 950 56 

*006 341 28 

*09 

•10 

•204 832 76 

•100 000 00 

•052 044 02 

•028 000 00 

•015 374 72 

•008 560 00 

•10 

•n 

•215 219 03 

•110 000 00 

•059 849 42 

•033 638 00 

•019 287 59 

•Oil 210 48 

*11 

•12 

•225 198 90 

•120 000 00 

•067 972 43 

•039 744 00 

•023 697 46+ 

*014 318 90 

*12 

•13 

•234 825 47 

•130 000 00 

•076 393 38 

•046 306 00 

•028 610 26 

•017 908 63 

*13 

•14 

•244 141 78 

•140 000 00 

•085 094 64 

*053 312 00 

*034 029 90 

•022 000 29 

*14 

•15 

•253 183 31 

•150 000 00 

•094 060 20 

*060 750 00 

•039 958 35“ 

•026 611 88 

•15 

•16 

•261 979 76 

•160 000 00 

•103 275 47 

•068 608 00 

•046 395 85+ 

•031 758 75“ 

•16 

•17 

•270 556 26 

•170 000 00 

•112 726 99 

•076 874 00 

•053 341 09 

•037 453 76 

•17 

•18 

•278 934 34 

•180 000 00 

•122 402 29 

•085 536 00 

•060 791 27 

•043 707 34 

•18 

•19 

*287 132 59 

•190 000 00 

•132 289 75“ 

•094 582 00 

•068 742 26“ 

•050 527 61 

•19 

K| 

•295 167 24 

•200 000 00 

•142 378 49 

•104 000 00 

•077 188 63 

•057 920 00 

*20 

•21 

•303 052 54 

•210 000 00 

•162 658 26 

•113 778 00 

•086 123 84 

*065 888 31 

•21 

•22 

*310 801 12 

•220 000 00 

•163 119 39 

*123 904 00 

•095 540 22 

•074 433 78 

•22 

•23 

•318 424 23 

•230 000 00 

•173 752 67 

•134 366 00 

•105 429 11 

*083 655 66 

•23 

*24 

*325 931 94 

•240 000 00 

•184 549 36 

•145 152 00 

•115 780 88 

•093 251 17 

•24 

•25 

•333 333 33 

•250 000 00 

•195 501 11 

•166 250 00 

•126 586 00“ 

•103 616 62 

*25 

*26 

•340 636 66 

•260 000 00 

•206 599 90 

•167 648 00 

137 830 12 

*114 342 43 

*26 

•27 

•347 849 40 

•270 000 00 

•217 838 05+ 

•179 334 00 

149 504 09 

*125 723 19 

•27 

•28 

*354 978 44 

•280 000 00 

•229 208 15“ 

•191 296 00 

•161 594 04 

•137 647 82 

*28 

•29 

•362 030 07 

*290 000 00 

•240 703 03 

•203 522 00 

•174 086 40 

*150 104 54 

•29 

•30 

•369 010 12 

•300 000 00 

•252 315 79 

*216 000 00 

•186 966 96 

•163 080 00 

•30 

•31 

•375 923 99 

•310 000 00 

•264 039 69 

•228 718 00 

•200 220 90 

*176 669 34 

•31 

*32 

•382 776 69 

•320 000 00 

•275 868 23 

•241 664 00 

•213 832 81 

*190 526 26 

*•32 

•33 

•389 572 92 

•330 000 00 

•287 795 04 

•254 826 00 

•227 786 80 

•204 963 09 

•33 

•34 

•396 317 08 

•340 000 00 

•299 813 93 

•268 192 00 

•242 066 44 

*219 850 85+ 

*34 

*35 

•403 013 32 

•350 000 00 

•311 918 83 

•281 750 00 

•256 654 85“ 

•235 169 38 

•35 

•36 

•409 665 53 

•360 000 00 

•324 103 83 

•295 488 00 

•271 534 73 

•250 897 31 

*36 

*37 

•416 277 43 

•370 000 00 

*336 363 11 

*309 394 00 

•286 688 37 

•267 012 22 

•37 

•38 

•422 852 55+ 

*380 000 00 

•348 690 97 

*323 456 00 

•302 097 72 

*283 490 70 

•38 

•39 

•429 394 26 

•390 000 00 

•361 081 79 

*337 662 00 

*317 744 35+ 

*300 308 37 

•39 

*40 

*435 905 78 

•400 000 00 

•373 530 04 

•352 000 00 

•333 609 56 

•317 440 00 

•40 

•41 

•442 390 22 

•410 000 00 

•386 030 28 

•366 458 00 

•349 674 36 

•334 859 57 

•41 

•42 

•448 850 58 

•420 000 00 

•398 577 12 

•381 024 00 

•365 919 48 

•352 540 34 

•42 

•43 

•455 289 74 

•430 000 00 

•411 165 24 

•395 686 00 

•382 325 47 

•370 454 92 

•43 

•44 

•461 710 54 

•440 000 00 

•423 789 37 

•410 432 00 

•398 872 64 

*388 575 33 

•44 

•45 

•468 115 72 

•450 000 00 

•436 444 29 

• *425 250 00 

•415 541 14 

j *406 873 12 

•45 

*46 

•474 507 97 

•460 000 00 

•449 124 80 

•440 128 00 

*432 310 98 

! *425 319 39 

•46 

•47 

•480 889 93 

•470 000 00 

•461 825 74 

*455 054 00 

•449 162 04 

•443 884 85+ 

•47 

•48 

•487 264 21 

•480 000 00 

•474 642 00 

•470 016 00 

•460 074 10 

*462 539 98 

•48 

*49 

•493 633 38 

•490 000 00 

•487 268 45+ 

•485 002 00 

•483 026 87 

*481 255 00“ 

•49 

•50 

•500 000 00 

•500 000 00 

•500 000 00 

•600 000 00 

•600 000 00 

•500 OjOO 00 

•50 









848 Further Application* in Statistics of T m (x) Bessel Function 


*-•01—-50 


TABLE II ( continued ). 


v-8—13 


X 

i(t>-4=3-5 

9 

4 

10 

4-5 

11 

5 

12 

5*5 

18 

6 

X 

•01 

•000 001 83 

•000 000 34 

•000 000 06 

•000 000 01 



•01 

•02 

•000 020 26 

•000 005 34 

•000 001 41 

•000 000 38 

•000 000 10 

•000 000 03 

*02 

•03 

•000 082 12 

•000 026 36 

•000 008 52 

•000 002 77 

•000 000 90 

•000 000 29 

•03 

•04 

•000 220 32 

•000 081 28 

•000 030 18 

•000 011 27 

•000 004 23 

•000 001 69 

*04 

•05 

•000 471 53 

•000 193 58 

'000 079 99 

•000 033 22 

*000 013 86 

•oa) 005 80 

*05 

•06 

•000 874 69 

•000 391 49 

•000 176 34 

•000 079 84 

*000 036 30 

•000 016 56 

•06 

•07 

•001 469 99 

■000 707 24 

•000 342 42 

•000 166 62 

•000 081 42 

•000 039 92 

•07 

•08 

•002 298 07 

•001 176 28 

•000 605 82 

•000 313 58 

•000 162 98 

•000 085 00+ 

•08 

•09 

•003 399 45" 

•001 836 58 

•000 998 30 

•000 545 31 

•000 299 08 

•000 164 69 

•09 

•10 

•004 813 96 

•002 728 00 

•001 555 21 

*000 890 92 

*000 512 41 

•000 295 71 

•10 

•11 

*006 580 38 

*003 891 63 

•002 315 12 

•001 383 83 

•000 830 43 

•000 499 98 

•11 

•12 

•008 736 05+ 

*005 369 26 ' 

•003 319 17 

•002 061 48 

•001 285 31 

•000 803 99 

•12 

•13 

•oil 316 60 

*007 202 82 

•004 610 61 

•002 964 92 

*001 913 91 

•001 239 42 

•13 

•14 

•014 355 68 

•009 433 86 

•006 234 16 

•004 138 37 

•002 757 43 

•001 843 09 

•14 

•16 

•017 884 79 

•012 103 17 

*008 235 49 

•005 628 66 

•003 861 14 

•002 656 86 

•15 

•16 

•021 933 07 

•015 250 28 

•010 660 61 

•007 484 70 

•005 273 88 

•003 727 40 

•16 

*17 

*026 527 17 

•018 913 11 

•013 555 36 

•009 756 81 

•007 047 55+ 

•005 105 78 

•17 

•18 

*031 691 16 

*023 127 64 

•016 964 83 

•012 496 17 

•009 236 46 

•006 847 03 

•18 

•19 

•037 446 37 

•027 827 56 

•020 932 88 

•015 754 09 

•oil 896 70 

•009 009 50+ 

•19 

•20 

•043 811 41 

•033 344 00 

•025 501 63 

•019 581 44 

•015 085 40 

•Oil 654 21 

*20 

•21 

•050 802 04 

•039 405 31 

•030 711 00 

•024 027 96 

•018 859 97 

•014 843 99 

•21 

*22 

•068 431 16+ 

•046 136 81 

•036 598 30 

•029 141 65+ 

•023 277 34 

•018 642 73 

•22 

•23 

•066 708 79 

•053 560 62 

•043 197 81 

•034 968 19 

•028 393 20 

•023 114 40 

•23 

•24 

•075 642 09 

•061 695 47 

•050 540 44 

•041 550 31 

•034 261 18 

•028 322 19 

•24 

*26 

*085 235 33 

•070 556 64 

•058 653 40 

•048 927 31 

•040 932 12 

•034 327 51 

*25 

•26 

*095 489 93 

•080 155 78 

•067 559 93 

•057 134 50- 

•048 453 32 

•041 189 04 

•26 

•27 

•106 404 49 

*090 500 89 

•077 279 01 

•066 202 79 

*056 867 88 

•048 961 83 

•27 

•28 

•117 974 83 

*101 596 24 

•087 825 23 

•076 158 25+ 

*066 213 99 

•057 696 37 

*28 

•29 

*130 194 04 

•113 442 36 

•099 208 53 

•087 021 75+ 

•076 524 39 

*067 437 74 

•29 

•30 

•143 052 55+ 

•126 036 00 

•111 434 17 

•098 808 66 

•087 825 79 

*078 224 79 

•30 

•31 

•156 538 20 

•139 370 22 

•124 502 57 

•111 528 58 

•100 138 40 

•090 089 40 

•31 

•32 

*170 630 31 

•153 434 35 + 

•138 409 26 

•325 185 15+ 

•113 475 54 

•103 055 86 

•32 

•3a 

•185 329 77 

‘168 214 12 

•153 144 92 

•139 775 93 

‘127 843 33 

•117 140 27 

•33 

•34 

*200 599 12 

•183 691 68 

•168 695 35- 

•155 292 26 

•143 240 39 

•132 350 07 

•34 

*35 

*216 422 66 

•199 845 73 

•185 041 56 

•171 719 29 

•159 657 74 

•148 683 72 

*35 

•36 

•232 776 67 

•216 651 65 + 

•202 159 85" 

•189 035 96 

•177 078 63 

•166 130 40 

*36 

•37 

•249 634 99 

1 *234 081 58 

•220 021 92 

•207 215 12 

•195 478 62 

•184 669 86 

•37 

•38 

•266 970 13 

•252 104 62 

•238 595 07 

•226 223 66 

•214 826 60 

*204 272 46 

•38 

•39 

•284 752 43 

•270 686 94 

•257 842 33 

•246 022 66 

•235 079 98 

•224 899 18 

•39 

•40 

•302 950 64 

•289 792 00 

•277 722 72 

•266 567 68 

•256 194 90 

•246 501 87 

•40 

•41 

•321 531 96 

*309 380 70 

•298 191 47 

•287 809 02 

•278 116 57 

•269 023 54 

•41 

•42 

•340 462 20 

•329 411 61 

•319 200 28 

•309 692 05- 

•300 784 62 

•292 398 79. 

•42 

•43 

•359 705 85+ 

J *349 841 12 

•340 697 63 

•332 157 58 

•324 132 59 

•316 554 32 

•43 

•44 

•379 226 29 

•370 623 73 

*362 629 05+ 

•355 142 26 

•348 088 40 

•341 409 58 

•44 

•46 

•398 986 85" 

•391 712 20 

•384 937 60“ 

•378 579 05+ 

•372 574 95+ 

•366 877 42 

•45 

•40 

*418 946 02 

•413 057 85+ 

•407 563 65" 

•402 397 68 

•397 510 74 

*392 864 92 

•46 

•47 

•439 067 55+ 

•434 610 74 

*430 446 28 

•426 525 10 

•422 810 52 

•419 274 23 

•47 

•48 

•459 310 62 

•456 319 93 

•453 622 63 

•450 886 10 

•448 385 98 

•446 003 47 

•48 

•49 

•479 634 96 

•478 133 76" 

*476 728 77 

•475 403 74 

•474 146 52 

*472 947 73 

•49 

•60 

•500 000 00 

•500 000 00 

•500 000 00 

•500 000 00 

•500 000 00 

•500 000 00 

*50 



Karl Pearson, S. A. Stouffer and F. N. David 


*=-0X_-B0 


v=14 


TABLE II ( continued). 


v* 14—19 


17 

18 

19 

8 

8*5 

9 


*000 000 01 
*000 000 10 
•000 000 60 

•000 000 03 
•000 000 23 

•000 000 01 
•000 000 09 

•000 000 03 

•000 000 01 


*000 002 44 

•000 001 03 

•000 000 43 

•000 000 18 

•000 000 08 

•000 000 03 

*000 007 58 

•000 003 48 

•000 001 60 

*000 000 74 

•000 000 34 

•000 000 16 

•000 019 64 

•000 009 68 

•000 004 79 

•000 002 37 

•000 001 18 

•000 000 58 

*000 044 46 

•0001 32 

•000 012 25+ 

•000 006 45+ 

•000 003 40 

*000 001 80 

•000 090 84 

•000 OoO 26 

•000 027 87 

•000 015 48 

•OOO 008 62 

•000 004 80 

•000 171 14 

*000 099 29 

•000 057 73 

•000 033 63 

•000 019 62 

•000 011 47 

•000 301 88 

•000 182 71 

•000 110 82 

•000 067 34 

•000 040 98 

•000 024 98 

•000 504 31 

•000 317 09 

•000 199 79 

•000 126 12 

•000 079 74 

•000 050 49 

*000 804 83 

•000 523 85+ 

•000 341 67 

•000 223 26 

•000 146 11 

•000 095 76 

•001 235 25+ 

*000 829 80 

•000 558 56 

•000 376 66 

•000 254 40 

•000 172 06 

•001 833 03 

•001 267 55- 

•000 878 26 

•000 609 61 

•000 423 79 

•000 295 03 

*002 641 24 

•001 875 80 

•001 334 80 

•000 951 48 

•000 679 29 

•000 485 63 

•003 708 45+ 

•002 699 49 

•001 968 83 

•001 438 39 

•001 052 45- 

•000 771 11 

*005 088 43 i 

•003 789 71 

•002 827 82 

•002 113 61 

•001 582 13 

•001 185 88 

•006 839 73 

•005 203 53 

*003 966 12 

•003 027 93 

•002 315 06 

•001 772 34 

•009 025 04 

•007 003 56 

•005 444 77 

•004 239 75“ 

•003 306 16 

*002 681 46 

•Oil 710 56 

•009 267 38 

•007 331 16 

•005 814 91 

•004 618 74 

*003 673 27 

•014 965 07 

•012 036 80 

•009 698 41 

•007 826 37 

•006 324 38 

•005 116 96 

•018 859 04 

•015 416 95- 

•012 624 57 

•010 353 56 

*008 502 51 

•006 990 84 

•023 463 59 

•019 475 23 

•016 191 67 

•013 481 54 

•Oil 239 78 

•009 381 85+ 

•028 849 42 

•024 290 14 

•020 484 48 

•017 299 84 

•014 629 02 

•012 384 78 

•035 085 64 

•029 940 04 

•025 589 25- 

•021 901 19 

■018 768 05“ 

*016 101 16 

*042 238 66 

•036 501 79 

•031 592 19 

•027 379 97 

•023 758 09 

*020 637 80 

•050 370 99 

•044 049 36 

•038 577 95+ 

•033 830 47 

•029 702 04 

•026 104 99 

•059 540 16 

•052 652 47 

*046 627 96 

•041 345 12 

•036 702 44 

•032 614 43 

•069 797 56 

•062 375 21 

•055 818 76 

•050 012 54 

•044 859 37 

•040 276 94 

•081 187 50+ 

*073 274 66 

•066 220 38 

•059 916 58 

•054 268 23 

•049 199 94 

•093 746 17 

•085 399 64 

•077 894 69 

•071 129 37 

•065 017 36 

•059 484 84 

•107 500 85+ 

*098 789 54 

•090 893 88 

•083 719 39 

•077 185 82 

•071 224 36 

•122 469 15+ 

•113 473 24 

•105 259 06 

•097 739 68 

•090 841 13 

•084 499 88 

•138 658 39 

•129 468 23 

•121 018 94 

•113 231 13 

•106 037 17 

•099 378 86 

•156 065 13 

•146 779 78 

•138 188 78 

•130 220 07 

•122 812 26 

*115 912 48 

•174 674 82 

•165 400 43 

•156 769 47 

•148 716 95+ 

•141 187 52 

*134 133 51 

•194 461 65- 

•185 309 55+ 

•176 746 88 

•168 715 38 

•161 165 45+ 

•154 054 43 

•215 388 47 

•206 473 17 

•198 091 45+ 

•190 191 42 

•182 728 93 

•175 666 06 

•237 406 98 

•228 843 95+ 

•220 758 00 

*213 103 18 

•205 840 51 

•198 936 49 

•260 457 95+ 

•252 361 44 

•244 685 83 

' *237 390 78 

•230 442 12 

•223 810 52 

•284 471 69 

•276 952 44 

•269 799 10 

•262 976 59 

•256 455 26- 

•250 209 65+ 

•309 368 62 

•302 531 68 

•296 007 46 

•289 765 87 

•283 781 46 

•278 032 49 

•335 059 97 

•329 002 56 

•323 206 91 

*317 647 66 

•312 303 36 

•307 156 73 

•361 448 66 

•356 258 18 

•351 280 94 

•346 496 07 

•341 885 97 

*337 436 62 

•388 430 23 

*384 182 49 

•380 101 86 

. -376 171 82 

•372 378 46 

•368 709 69 

•415 893 90 

•412 651 52 

*409 532 32 

•406 524 00 

•403 616 22 

•400 800 13 

•443 723 76 

•441 534 89 

•439 427 07 

•437 392 14 

•436 423 24 

•433 514 52 

•471 799 96 

•470 697 27 

•469 634 78 

•468 608 41 

*467 614 74 

*466 650 88 

•500 000 00 

*500 000 00 

*500 000 00 

•500 000 00 

•500 000 00 

•500 000 00 
























350 Further Applications in Statistics of T m (x) Bessel Function 

TABLE II (continued). 

'BO 20—25 


X 

r—20 
*(v-l)=9-5 

21 

10 

22 

10*5 

23 

n 

24 

11*5 

25 

19 

X 

*01 

•02 

•03 

•04 

*06 

•06 

•07 

•000 000 01 

•000 000 07 
•000 000 29 

•000 000 01 

•000 000 03 
•000 000 14 

•000 000 01 
•000 000 07 

•000 000 01 
•000 000 04 

•000 000 02 

•000 000 01 

•01 

•02 

•03 

•04 

•05 

•06 

•07 

.08 

•000 000 95+ 

•000 000 60+ 

*000 000 27 

•000 000 14 

•000 000 07 

•000 000 04 

•08 

•09 

•000 002 68 

•000 001 50- 

•000 000 84 

•000 000 47 

•000 000 28 

•000 000 15- 

•09 

•10 

•000 006 71 

•000 003 93 

•000 002 30 

•000 001 35+ 

•000 000 80 

•000 000 47 

•10 

•11 

•000 015 25“ 

•000 009 32 

•000 005 70 

•000 003 49 

*000 002 14 

•000 001 31 

•11 

•12 

*000 032 01 

*000 020 32 

•000 012 91 

•000 008 21 

•000 005 23 

000 003 33 

•12 

•13 

•000 062 86- 

•000 041 29 

•000 027 16 

•000 017 88 

•000 011 78 

•000 007 77 

•13 

•14 

•000 116 53 

•000 079 01 

•000 053 62 

•000 036 43 

•000 024 77 

*000 016 86 

•14 

•16 

•000 205 66- 

•000 143 51 

•000 100 25" 

•000 070 09 

■000 049 08+ 

•000 034 36 

•16 

•16 

•000 347 61 

•000 249 10 

•000 178 69 

•000 128 30 

•000 092 20 

•000 066 31 

•16 

•17 

•000 565 66 

•000 415 42 

•000 305 38 

•000 224 70 

•000 165 47 

•000 121 96 

•17 

•18 

•000 889 94 

•000 668 58 

•000 502 78 

•000 378 44 

•000 285 08 

•000 214 92 

•18 

•19 

•001 358 45+ 

•001 042 34 

•000 800 56 

•000 615 42 

•000 473 48 

*000 364 66+ 

•19 

•20 

•002 017 96 

•001 579 12 

•001 236 90 

•000 969 70 

•000 760 83 

•000 697 39 

•20 

•21 

•002 924 66 

•002 331 03 

•001 859 63 

•001 484 85“ 

•001 186 54 

•000 948 85+ 

•21 

•22 

•004 144 68 

•003 360 54 

•002 727 27 

•002 215 21 

•001 800 70 

•001 464 80 

•22 

•23 

•005 754 21 

•004 741 03 

•003 909 79 

•003 226 96 

•002 665 41 

•002 203 14 

•23 

•24 

•007 839 40 

•006 556 89 

•005 489 06 

•004 698 86 

•003 855 91 

•003 235 21 

•24 

•26 

•010 495 75+ 

•008 903 28 

•007 668 96 

•006 422 71 

•005 461 25“ 

*004 646 85“ 

•25 

•26 

•013 827 25+ 

•Oil 885 44 

■010 224 94 

■008 803 24 

•007 584 63 

•006 539 01 

•26 

•27 

•017 946 01 

*015 617 59 

•013 603 19 

•Oil 857 54 

•010 343 10 

*009 027 87 

•27 

•28 

•022 965 63 

•020 221 26 

•017 819 13 

•015 713 86 

•013 866 69 

•012 244 30 

•28 

•29 

•029 008 64 

•025 823 31 

•023 005 53 

•020 509 80 

•018 296 81 i 

•016 332 51 

*29 

•30 

•036 195 01 

•032 553 36 

•029 300 01 

•026 389 94 

•023 784 00 

021 448 00 

*30 

•31 

•044 643 62 

•040 540 98 

•036 842 03 

•033 502 81 

•030 484 93 

•027 764 67 

•31 

•32 

•054 468 26 

•049 912 51 

•045 769 58 

*041 997 35- 

•038 568 78 

•035 421 14 

•32 

*33 

•065 775 56 

•060 787 68 

•056 215 44 

•052 018 98 

•048 163 08 

*044 616 45+ 

•33 

*34 

•078 660 87 

•073 276 06 

•068 303 28 

*063 705 29 

•059 449 03 

•055 505 05- 

•34 

•35 

•093 205 72 

•087 473 60 

•082 143 66 

•077 181 60+ 

•072 556 54 

•068 241 42 

*35 

•36 

•109 474 86- 

•103 459 16 

•097 830 02 

*092 556 03 

•087 609 19 

•082 964 44 

•38 

•37 

•127 513 53 

•121 291 33 

•115 434 93 

•109 915 98 

•104 709 14 

•099 791 71 

•37 

•38 

•147 345 27 

•141 005 63 

•135 006 62 

•129 323 12 

•123 932 38 

•118 814 04 

•38 

•39 

•168 969 90 

•162 611 64 

•156 565 91 

•150 810 19 

•145 324 36 

•140 090 39 

*39 

•40 

•192 362 12 

•186 092 02 

•180 103 87 

•174 377 87 

*168 896 34 

*163 643 44 

•40 

•41 

•217 470 63 

•211 400 28 

•205 580 00 

•199 992 54 

•194 622 61 

•189 456 14 

•41 

•42 

•244 217 84 

•238 460 66 

•232 921 30 

•227 584 86 

•222 438 tl 

•217 469 18 

*42 

•43 

•272 500 17 

! -267 168 16 

•262 022 11 

•257 049 35“ 

•252 238 59 

•247 579 79 

•43 

•44 

•302 188 97 

•297 389 37 

•292 744 90 

•288 244 94 

•283 880 06 

•279 641 85+ 

*44 

•46 

•333 132 07 

•328 964 09 

•324 921 83 

•320 996 61 

•317 180 76- 

•313 467 35+ 

•46 

•46 

•366 156 90 

•361 707 62 

•358 357 27 

•355 098 03 

•351 923 82 

•348 829 25+ 

•46 

•47 

•398 068 17 

•395 413 72 

•392 831 04 

•390 315 05- 

•387 861 26 

•385 465 66 

*47 

•48 

•431 660 96+ 

•429 858 18 

•428 102 39 

•426 390 22 

•424 718 70 

•423 085 19 

•48 

•49 

•465 714 30 

•464 802 84 

; *463 914 59 

•463 047 90 

*462 201 28 

•461 373 41 

1 *49 

•50 

•500 000 00 

•500 000 00 

____ 

! *500 000 00 

•500 000 00 

*500 000 00 

•500 000 00 

*50 







EXPERIMENTAL DISCUSSION OF THE (**, P) TEST FOR 

GOODNESS OF FIT. 

By KARL PEARSON. 

( 1 ) Introductory. In the Philosophical Magazine for 1900* I published, I think 
for the first time, a test which has since been much used for statistical purposes 
and has come to b i spoken of as the (**, P) test. The problem I had in mind was 
the following one: Samples of N are taken from a population classed in v categories, 
the chance of drawing an individual from the «th category being p,; how will the 
cell contents n lt ns ... n,... n v of the samples distribute themselves, and what is the 
probability, P, that samples may be drawn which deviate more from the mean 
parental values than a given sample? 

With certain limitations the answer was shown to lie in the distribution of x* 
by the curve 

S' — »• (ix*)* t*-*) * - »*• [<*»*)! 

where 

* .=1 Np, 

A short table was given in my original paper, and a longer one shortly after¬ 
wards computed by W. Palin Eldertonf, by means of which it was possible to 
compute P from a known and v (or v! as it was termed at that date). In dealing 
with the formula for x* the p ,’s might be those of a real parental population from 
which the sample had been actually drawn, or a hypothetical population from which 
we question whether it could reasonably be supposed to have been drawn. 

Some time after this (1911J) I published a second paper which discussed the 
problem, whether two observed samples of the same or different sizes, but with the 
same number v of categories, could reasonably be supposed to have been drawn from 
the same parent population. The problem is straightforward if the parent population 
be known. If it be not known, what values can be used to represent it? With some 
diffidence I suggested the values of the combined samples might be used to replace 
the parental population values. It was shown that in either of these cases we might 
use the (x*, P) test entering with v cells (= n' of the (x*. P) Tables). The point to 
be emphasised here is: That when we use the known parent population, the two 
samples cannot be written as a contingency table, for the marginal total of v cells is 
not the sum of the contribution of each sample to a given cell, thus it is not a marginal 

* Vol. h. pp. 157—176. 

t Biometrika, Vol. i. pp. 165—168. See also: Tablet for Statisticiaru and Biometriciani, Part 1, 
Table XII, 8rd Edn. lDBl. 

$ Biometrika, Vol m pp. 360—364. 





352 Experimental Discussion of the (x 8 , P) Test for Goodness of Fit 


total in the contingency table sense. For this reason it is in my view a mistake to 
look upon every comparison of two samples or two series as a contingency table; it 
only becomes a contingency table in form when we, ignorant of the sampled or parent 
population, make the doubtful hypothesis that the latter population can be replaced 
by the sum of the two samples. This is ,so often overlooked in the text-book 
treatment of the x 8 test for the co-origin of two samples that it is desirable to 
emphasise it. 


But the biserial table formed from two samples differs seriously in another 
respect from a true contingency table. In the latter case, according to my 
envisaging of it, we start with a parent population and, drawing individuals in 
succession from it, table them according to two. (or it may be more) characters. 
Thus in successive samples, say of size N, it is not only the contents of the te x X 
cells formed by k categories of one character and X of the other which vary from 
sample to sample, but also the marginal totals. In such a case there is only one 
degree of constraint on the contingency, arising from the size of the sample N. 


In a paper published in 1916* I dealt with what I termed “partial contingency,” 
namely, cases in which linear relations between the contents of certain numbers 
of cells existed from sample to sample. I proved in such cases that not only the 
n' by which we enter the (x 8 , P) table must be reduced by the number of such 
linear relations, but the observed x 2 itself might also according to the nature of 
these constraints have to be reduced. To express the matter algebraically let n gl be 
the sample frequency in the sth row and £th column cell of the sample, and if jht 
be the chance of drawing an individual from the s , t cell of the parent population, 
the expected value in the s, t cell of the sample will bo n gt = Np gtf and if 


<t> 2 


O (n* - n Ht f 
*,t Nn nt 


we have 




#r> sa y> 

S (nj ) j 


.(ii). 


Now the value of <f> 2 or x 2 will depend entirely on what value we give to n st . My 
idea was to give it such a value that <j > 2 would lead (subject of course to the error 
of random sampling) to a measure of the association of the two characters or variates 
in the table. In order to achieve this I assumed the parent population to have no 
contingency or association between its variates. In this case n& = Np a . p, t = v 9 . n^jN, 
where p 9 . and p. t are the respective chances of an individual being drawn out of the 
sth row, and an individual being drawn out of the £th column of the parental 
population, and n 9 . and n. e are the respective relative frequencies of row and column 
for samples of size N. It will be seen that n t , and n. t are not so far the sum of the 
sth row and $th column of any particular observed sample. We have 

N (1 + <p) - N + x * = NS .(iii). 

Now if we do not know the parent population, we may follow one of two courses: 


* Biometrika , Vol. xi. pp. 145—158 and pp. 159—190. 





Karl Pearson 


353 


(a) Assume ft* and ft.* are constants, values for the unknown parent population, 
and determine on the basis of their being constants the mean and variance, etc. of 

and x* in terms of the unknown algebraic quantities typified by ft* and ft. t . 
Finally in the formula so reached we are compelled to insert in want of better 
information the values of the «*, n. t actually reached in the observed Table. 

(b) Assume that n* and ft., are for each individual sample replaced by the 
sample values; we thus get a different definition of <£' and %*, and the mean and 
variances etc. will not be the same in cases (a) and (b). For in the latter case we 
have to allow for the variation of »* and n. t in the formula 

N(l + <p) = N+tf = Ns(jf^) .(iv). 

<f > 8 has been discussed from the standpoints of both (a) and ( b ) in a series of previous 
papers in this Journal * What I want to emphasise is that if we start with (iv) as 
a definition of <f> 2 and sample a parent population by taking individuals out one at 
a time and recording their characters, we obtain samples in which there is no fixing 
of either marginal total column. On the other hand, if we draw two independent 
samples, say, of boys and girls for eye colours, one from a population of boys, another 
from one of girls, we are dealing with a wholly different method of sampling. We 
can form a spurious contingency table out of these two rows with 2v cells, but 
theoretically we are limited to v cells, as I showed in the original treatment of the 
problem, and little appears to be gained by saying we have introduced v conditions 
of constraint. 

(2) Goodness of Fit. I now turn to the main topic of this paper, the application 
of the x 2 test to the problem of “ goodness of fit.” Here again divergence of opinion 
seems to be largely based on difference of aim and definition. 

Suppose we take a random sample from a population, the whole of which 
we cannot observe or measure. The object of the anthropologist or craniologist is 
to ascertain how far, when making comparison with samples from other racial series, 
he may replace the not-fully measured parent population by his sample. In other 
words, we have two populations A and B> and we have samples a and b from them. 
We want to ascertain how far we can suppose A and b to be alike by a consideration 
of whether a representing A is alike to 6. We are bound by the conditions of 
affairs to observe a sample of A; it stands for us as A, but which is not really A. How 
far does the fact that it is a sample only of A preclude us from ascertaining whether 
a sample b of B could with any probability have been obtained from A . This is the 
everyday problem of anthropologist, sociologist and most statisticians, but it is also 
the problem of “ goodness of fit/’ and it is indeed the problem by which I originally 
reached the (x a , P) test, only the paragraph dealing with it reads obscurely and has 
been largely overlooked. Let us suppose we have a parent population of v categories 

* Biometrika , Vol. v. pp. 192—203 (1906, with J. Blakeman); Vol. x. pp. 670—673 (1916); Vol. xi. 
pp. 216—230 (1916, with A. W. Young), and this corrected, Vol. xn. p. 260. 




354 Experimented Discussion of the (x*> P) Test for Goodness of Fit 

with probabilities of pi ... p t ... p V) that a sample of N has been drawn with 
frequencies n% ... n 9 ... and that the moment coefficients about any point of this 
sample have been found /u*' ... /V> the variates corresponding to the frequencies 
being x 1 ... x ,... a v . Then 

% &1 U + n%Xj* + .. • + tl v X v u * 

but it will not — NM U \ where M u * is the ath moment coefficient of the parent 
population. Every fresh sample will give a fresh series of moment coefficients, 
which will not equal those of the unobserved or unknown parent population. There 
is, thus, in this case no question of the limitation of the “degrees of freedom.” When 
does such limitation occur? Only when we know the parent population, and there¬ 
fore its moments, and fit various curves to that population by aid of t of its moments. 
We then have ^linear relations among the cell frequencies, and must reduce our 
“degrees of freedom” by that number.. But surely this is not what we usually 
require? We do not know the parent population. We know a sample of it. To this 
sample we fit a curve and our problem is: How far may we use this curve to 
represent the unknown parent population? How far will further samples give 
corresponding and x 2 when compared with the true parent population and with 
the sample from it ? 

We here reach two very important points: 

(a) The distribution of the fitted curve to any sample gives a far lower 
X 2 when compared with the parent population than the raw sample from which it 
was constructed. 

( b ) The distributions of x 8 for the parent population and of x' 2 for the fitted 
curve of the sample deduced from any fresh samples are such: (i) that the mean 
difference of x 2 an d X 2 i 8 small and (ii) that the correlation of x 2 and x ' 2 
is extremely high. I attempted to give a proof of this in my original paper, a 
proof which has been considered obscure, but should have more or less indicated 
what my problem was. It was not “goodness of fit” of the curve deduced from 
the sample to the individual frequencies of that sample, but that of the y^ % dis¬ 
tribution treated as an approximation to the x 2 distribution that I had in view. In 
other words, I was considering and still want to consider how far we may replace 
the unknown parent population by the frequencies of the smooth curve deduced 
from a sample. For this purpose I have in the present paper selected a normal 
curve to represent the parent population with a standard deviation of 10. Luckily— 
for it saved me much labour—Dr Egon S. Pearson had somewhat over 1000 samples 
of 15 drawn from such a population by aid of Tippett's Random Sampling Numbers *, 
and I am very grateful to him for allowing me the use of them. He had computed the 
proportional frequencies for a central tenth of the standard deviation and for thirty 
such tenths on either side of the central group. Such a distribution is not an exact 
normal distribution, but it is very close to it; thus its standard deviation was 10 0283 
without correction, and 9*9909 with Sheppard's correction, both close enough to the 

* Tracts for Computers , No. XV. Cambridge University Press. 



Karl Pearson 


355 


value 10 of the actual curve. As a matter of fact we may consider this distribution 
the parent one; there is no special merit in considering it an exact normal curve. 
Out of Dr Pearson's 15,000 odd samplings from the above parent population I took 
eight basic samples, none of which covered the same ground, they were independent 
samples from the parent population. These samples were of sizes 600, 300,150, 
105, 60, 45, 30 and 15. I term these the basic samples . The actual frequencies 
occurring in each sample I term the Raw Basic Samples. Each of these eight 
samples was fitted with a normal curve and the frequencies recomputed from this 
normal curve. These distributions I term the Smooth or Graduated Basic Samples. 
Finally, for a purpose to be explained later, I reduced all the frequencies of the 
Smooth or Graduated Basic Samples to a total of thirty. These may be referred to 
as the Graduated $ *ic Samples reduced. The parental population was reduced to 
a total thirty. I then proceeded to take 100 samples of thirty from the data. These 
were independent of each other and of the eight basic samples, i.e. all resulted from 
completely independent drawings. Of course had time and energy permitted, it 
would have been advisable to have had a large number of basic samples of each 
size and more than 100 samples to compare with them, but what has been done 
involved the computing of 900 s, and that means much labour*. The s 
obtained from the smooth basic samples were then compared with the ^’s obtained 
from the same series of 100 samples as against the parent population, and eight 
correlation tables were thus obtained. The close relationship between the from 
a smoothed basic sample and the from the parent population became at once 
manifest, and there were very few cases in which one of the hundred samples would 
on the measure of its probability have been rejected or retained as a sample of the 
parent population when it would not in the same way have been rejected or retained 
by any one of the smoothed basic samples. In other words, the curve provided by 
th& V**sic sample is a “good fit” to the parent population, and to judge by the present 
experience we are reasonably safe in replacing the unknown parent population by 
a graduated b?sic sample. Now the moment coefficients of the basic samples are 
not the same as those of the t^Oiio population, nor have they the same values for 
each basic sample. What lias happened is this, that in calculating the distribution 
of the *»’s of the 100 samples we have replaced the parent population frequencies 
by those of the smoothed basic sample, and the result has shown that we shall not 
make many or frequent errors of judgment in so doing. I have only been able to 
take eight samples of different sizes as an illustration. Had it been possible to take 
50 or 100 samples of each size, we should no doubt have seen the advantage of the 
large over the small basic sample with respect to its goodness in representing the 
parent population. As a matter of fact in two basic samples one of 30 may be better 
than one of 300, though with a large number of samples those of 300 would certainly 
as a whole be better than those of 30 f. 

* The. bulk of the computing work was done for me by Dr L. T. Woo, but Sir Georg Hansmann 
undertook one series. 

f The basic samples of 15 and 80 are for their rise extremely favourable, i.e. give results much more 
accordant with those of the parent population than would be anticipated. 



356 Experimental Discussion of the (\\ P) Test for Goodness of Fit 


In obtaining the values of it was needful to limit the number of categories, 
and ultimately 15 categories were selected each of -^ths of the standard deviation, 
namely: 



Below 

-195 

—19*5 
-16 ; 5 

-16*5 

-13*5 

-13*5 

-10*5 

9 

— 7*5 

- 4*5 

HI 

m 

Central Values 

- 

— 

-18 

-15 

-12 

-9 

-6 





4-1*5 

44*5 

44*5 
+ 7-5 

47*5 
+ 10-5 

410*5 

413*5 

413*5 

416*5 

416*5 

419*5 

Above 

419*5 

. Central Values 

43 

+6 

49 

412 

415 

+ 18 

— 


It was desirable to have a considerable number of categories, but 15 categories 
was rather a large number for samples of 30, and it might have been better to 
increase the number of individuals in the test samples, as the small frequency 
in some of the categories would militate against the theoretical justification of 
replacing binomials by normal curves. As this would apply equally to all the basic 
samples and to the parent population, and we were dealing so to speak with 
relative values of and x' a , I do not think it will affect the validity of our results, 
as it will influence x 2 to much the same extent as x' 2 - I had also in mind another 
reason for choosing n g to be small, which will appear later. 

(3) Experimental Details. Table I gives the actual data for each basic 
in four columns. The £ r «t* of these gives the raw basic frequencies (K.B.F.), the 
second gives the frequencies (g.b.*.) of the graduation—in this case a normal 
curve—replacing the original sample (r.b.f.;, tL: *Wrd column gives the corre¬ 
sponding frequencies of the parent population (p.p.f.), and the fourth column the 
reduced baBal frequencies (r.G.f.) for a sample of 30. 

In calculating the between the parental population and a basic sample, raw 
or smoothed by its curve, the full number of the sample was used, the relative 
frequencies of the parent population being modified to give the total of the sample. 
In treating the basic sample in its turn as a parent population for the 100 samples 
of 30, the reduced graduated values of the basic sample were of course employed. In 
the computing of and the resulting P, where given, there is no question of any 
constraint beyond the size of the sample. Each basic sample has its own moment 
coefficients, and they are not the same as those of the parent population, nor of 
another basic sample of the same size. 

Table II gives the mean and standard deviation of the raw data from which the 
smoothed frequencies were computed. 

















Basic Samples: Raw Frequencies (R.B.F.), Graduated Frequencies (G.B.F.), Relative Parental Population Frequencies (P.P.F.) 

and Graduated Frequencies reduced to 30 (R.G.F.). 


Karl Pearson 


357 


1 

if 

H f*"* H H H N H M m 

1111 t II ++++++++ 

pH ph pH pH 

1 1 1 1 1 1 I+ + + + 4- + 

Sample of 105 

B.G.P. 

?S$;383S:§2§23322$£ 

ft 

ft 

ft 

8SH2$!;888$?iS2S3p 

W(N^QDao6fHN»-i6oDcb^'W(N 

p-* H H 

ft 

p 

d 

wo^qpHQO'i<(^ir5a>«DpM05?pr>-i> 

oooo^Oip^whHODwfnw 

ftfh«»boD©£N«<N^<a><bT*<NCN 

jH pH pH fH pH 

R.B.F. 

to |CD^l'«O(N00>O«Dt^t'-»f5(NP5 

1 pH pH rH pH 

Sample of 150 

*5 

O 

« 

cgejjiWOOaQHCDH^iOiotoci 

©Cf5oip(>iop55«p’^r-iiflC5S5o5o 

ft 

ft 

ft* 

ft 

d 

3*84 

3*58 

5*86 

8*75 

11*97 

14*96 

17*10 

17*88 

17*10 

14*96 

11*97 

8*75 

5*86 

3*58 

3*84 

3*31 

3*14 

5*20 

7*90 

10*99 

14*02 

16*39 

17*57 

17*27 

15*56 

12*86 

9*73 

6*76 

4*30 

5*00 

'J‘0 

’f't^OWINOOHOiOXlp-INOOTCTf 

hhhNhhpih 

Sample of 300 

ft 

d 

d 

r>1©3ir-<OiOO>'^**'-<CD»Op-Oq<N 

OiC05»p(NOi'«f|^<p(S«l»pap^l>-l^ 

* 

ft_ 

ft 

PQ 

d 

OOffiHOWlNHQOHfNMQHffl® 

<© ih l - ip 05 05 <N i> t>l 05 05 >(5 p* i-h <X> 
l'^ N A- N N 05 ■tfl lb H 05 W N ^ N N 
HH(N(Mrt«W«NHH 

rHH<QCiOpHfM(NL'5 05¥ > 05> t N , ^ f NO 

ib ib as »b w hi ^ iV. £> cn ib cc bi j> i- 

HQq^rtrtWWNHrH 

ft 

« 

COiOp-i(Mrtr^«C?CJ'«5t>-C!fOCOO 
ph ph n cq to to to <n cn ph ph 

Sample of 600 


l'-OWG005C02!«>l»(Ni'-iOaO-HPCi 
t;M;-.H©CN<»«^W©^Q0<Na0O5 
HHWNpjpjnniSHH ‘ 

Ph 

CM 

CM 

15*35 

14*33 

23*42 

35*01 

47*86 

59*84 

68*42 

71*54 

68*42 

59*84 

47*86 

35*01 

23*42 

14*33 

15*35 

G.B.F. 

pHi'.fiqcNCicQCi®^* C5«PW«0^t 

r*03WWOOip®pHi>f5«OippHcb 

ibwwcCtbt^QDOSI^OOSP-ib^DaO 

rHpH(NCO^*m<0«OQD©^eOWpHfH 

B.B.F. 




iQiAiftiAiaOtOiOiaiCifliAiOiQiQ 



b>&wdr^»HiH4ff»6e'5&6&c& 



HHHH H phi pH pH pH 

|| 

111 1 1 1 1 ++++++++ 

35 










1 1 1illl++++++^ 



Biometrika xxiv 











358 Experimental Discussion of the (x s , P) Ted for Goodness of Fit 


TABLE II. 

Means and Standard Deviations of Basic Samples . 


Parent 

Population 

Mean 

Standard Deviation 

0 

10* 

Size 600 

+ •4260 

10*2276 

„ 300 

+ *6300 

9*6291 

„ 160 

+ •9000 

10*1380 

„ 106 

+ *4286 

9-4334 

„ 60 

+ *8000 

10-3262 

„ 45 

- *6333 

11-0041 

» 30 

-*6000 

9*5796 

„ 16 

+ •6000 

0-6867 


It will be noted that only two of the means are negative; the odds against so 
small a number of negative signs are only about 5 to 1; yet should there be a 
series of rather improbable cases arising from Tippett's Random Sampling Numbers y 
we must remember that that series itself is a random sample, and may be a rather 
unusual onef. Table III compares the ^**s and P’s as found from the Raw Basic 
and Graduated Basic Samples as against the Parent Population. Two points at once 

TABLE III. 

Goodness of Fit of Basic Samples to Parent Population . 


Basic Sample 

Raw Basic Sample 

Curve from Raw Basic Sample 

Size 

X 2 

P(n' = 15) 

X s 

P(n'=15) 

600 

8-2890 

•8725 

1*6789 

•999,943 


14-6609 


2-2785 

•999,682 

160 


•7703 

1-2936 

•999,976 


10-1715 

•7491 

-7782 


60 

19-6397 

•1427 

*5115 

>*999,999 

46 

9-8996 

•7691 

1-0000 


30 

12*3442 

•6788 

•1868 

>•999,999 

15 

10-8809 

mm 

•1462 

>•999,999 


* This was the standard deviation of the parental population curve from which the relative 
frequencies of this population were calculated. Working back from these computed frequencies to their 
standard deviation, we find 10*0388 for its value, = 0*9009 on applying Sheppard’s correction. Corre¬ 
sponding to this the standard deviations recorded are all corrected values, and the relative frequencies of 
the basic samples were computed from the means and these corrected standard deviations. 

t This caution is not given wholly unadvisedly. 1 have not myself made mueh use of Tippett’s 
numbers, but recently I obtained in 100 trials thru such unusual samples that only one should have 
occurred in 1,000,000 trials. 

































Karl Pearson 


359 


result from this table. First all the raw basic samples are, as of course they really must 
be, probable samples individually and as a group from the parent population. 
Secondly the curves fitted from the raw basic samples to the parent population—note, 
not to the raw basic frequencies themselves—are most excellent fits. They can be 
said to represent with a high degree of accuracy the parent population. The 
experience represented must, I think, be of interest and of real value to the anthro¬ 
pologist, who can rarely if ever measure whole populations, but has always before 
him the problem of whether a certain sample can be considered as belonging to a 
population he only knows from the graduated frequencies of another sample. We 
are not concerned here with the goodness of fit of a graduated curve to its raw 
sample, but of the goodness of fit of a graduated curve based on a raw sample to a 
graduated parent popjj^ion from which the raw sample has been drawn. What 
we are considering in this case is the goodness of fit of a graduated sample to a 
graduated normal population, there is no limitation of the degrees of freedom, for 
the moments by which the graduation is determined change from sample to 
sample *. 


(4) Goodness of Fit of Graduated Basic Samples to Raw Basic Samples . Here 
there is a point to which attention is not always given, or, perhaps, not sufficient 
attention. Many years agof I showed that if two samples n Hy N, n/ } N '—s corre¬ 
sponding to the *th category out of v categories—were taken from a parent 
population in which p a was the chance of drawing an individual at random from the 


8th category, then if 


(n 8 n/\ # 

r \N~W) 


UN + N' 


we have % 1 distributed according to the curve 


y - yo« " 5 * s (ix*)* {v 3) [<Z (4x a )].(vi). 


But an essential condition of this result is that the series p g is to be considered constant 
throughout the series of pairs of samples. It is only under these conditions that the 
constants of x 2 > for example its mean and standard deviation, can be supposed given 
by the above distribution In applying the test to two samples, it is always 
well to consider what w f e are assuming our parent population to be. We may of 
course put for p a any series of values we please, and can find the probability that 
the two samples belong to the corresponding population. If we have no knowledge 
of the parent population, we can use as the best substitute available for p a the sum 
of the two samples, but our result is bound to be unreliable if those samples are 
not considerable. 


# I may note here that I have often been asked: What is the value of so mnoh curve fitting to 
samples? The answer is more or less conveyed in the present paper where we ean see that the graduated 
basic sample effectively represents a parent population,-even in the case of relatively small samples, and 
so serves as a standard for measuring the degree of divergence of one population from a second. This 
should be done not by comparing raw, but by comparing graduated basic samples, 
t Biometrika , Vol. vin, pp. 250—254, 1911. 

J There is a still further limitation, n A and n/ must not be correlated. 


28—2 





860 Experimental Discussion of the (x\ F) Test for Goodness of Fit 

On this latter assumption the value taken for x* w '^ be 

s U N ') 

„=i n, + «7 

But if we do this we must remember that if we take another pair of samples 
indicated by ft.,, ft,', the corresponding 

x -i.n.t S . J [J > 

K ,=i n, + ft* 

(-• - ?1l) 

i k. El 


but is equal to 


n # + n 8 


otherwise the distribution of is not given by 

y = yoe-M(h?) i{v - S) [d(W)l 

Hence obscurity seems to me to arise when we write 


ttl 

n % I .. 

71, 

ni 

ni .. 

. ni 

(N+N')p i 

(j\ r + N')p a 1 .. 

. (1V + N')p. 


and replacing the horizontal totals by n t -f ni speak of the result as a “contingency 
table” with v constraints. It is true that if we write 


Wt 

ni 

77a' 

... 

n g 
„ / 

”'8 

... 

n v 

ftp 

N 

N' 

ni + ni 

M 2 + )>2 

. . . 

n, + n / 


M* + 

\~nTi 


this single pair of samples forms a contingency table* with 2v cells where the 
previous method has only v cells and we speak of 2v cells with v degrees of constraint. 
But the next or any other pair of samples will give 

n\ n* n H fi„ N 

ni ni N' 

ni + ni ri 2 -f n 2 ' 


n t 4* nj 


ft N 
N' 

* v + n v ' N+N' 


and although the horizontal totals are still fixed, this is far from being a contingency 
table unless ft + hi ** n g + ni for emery pair of samples. 

The relation (JV" •f N*) <f > 2 holds for the first pair of samples, and this only 

when we replace (N + N')p 8 by n 9 + ft/. Such a relation as that written down leads 
the student to believe that ^ is always proportional to for every other pair of 
samples beyond the first pair this is not true, and accordingly it seems to me a 
misfortune to discuss the matter under the heading of a contingency table; it 
confuses in the mind of a student the difference between a test, with its pseudo- 


* The equality of (N + N') and the x a above is of course easily demonstrable* 



Karl Pearson 


361 


contingency table based on a narrow hypothesis, and the true sample contingency 
table, where the marginal totals vary, and the only limitation is the single one, i.e. 
the size of the whole table. For these reasons I much prefer not to look upon the 
two sample test as a case of a contingency table, but as a comparison of the 
difference of the relative frequency of two samples with a certain parent population. 
Naturally this leads the student to define clearly what his parent population is 
supposed to be. 

Now in Table III we have given two illustrations of “Goodness of Fit.” First 
we have the raw basic samples and we compare them with the parent population 
for 15 categories. There is no doubt in this case that there is no limitation in the 
way of constraints beyond the size of the raw basic samples, and we look up Eldertons 
Table with n'= 15. JS^ondly we have tested the graduated against the parent 
population and we have used two moments, not of the parent population but of the 
raw sample ; clearly except for the fractions such a sample could arise directly from 
sampliug the parent population. Such a sample would be rare, and its goodness of 
fit is made obvious by the smallness of its But we have no constraint; the 
moments of the next sample will differ from those of this one. We have by fitting 
by moments only selected one of the possible samples of the parent population, 
and we find that there are few samples better than it with regard to the fit to the 
parent population. To get—in particular from a small sample—the best possible 
approach to the parent population may be a difficult problem, but whether we 
graduate by two or four moments we are not restricting the number of degrees of 
freedom, we are simply selecting a possible sample out of endless possible samples. 
Where then does restriction of the degrees of freedom arise ? Only as far as I can see 
when we fit by the moments of a sample a series of curves to the sample using the 
same moments in each case and the same number of categories; then the curve 
with the lowest will have the best fit. But the P of the % 2 table must be looked 
up under n ' less the number of moments used *. This is however not the case I 
personally have had in view when considering “goodness of fit”; I want to ascertain 
how close the graduated sample is to the parent population, not to its raw sample. 
How far in the case of unknown parent populations can a measure of further 
samples from the parent population given by ^ a ’s be replaced by the measure 
of departure of these samples from the graduated first sample? This problem will 
be answered in the next section; the object of the present section is to consider the 
graduated basic sample in relation to the raw basic sample. In our case the 
graduated sample has been fitted by two moments: are we to reduce n' by two 
constraints? Clearly we are not fitting a series of curves to the one sample, we 
have selected a normal curve and we are not asking whether it is a better fit than 
a parabola or a sine curve. We must first determine which curve in. the present 
comparison is to stand as the parent population. Obviously it cannot be the raw 
basic sample, for in that case it may have zero frequency in certain cells and 

* For example, we might fit as graduation to a sample either the curve y=y 0 e~P x T in (x(b) or the curve 
of Type IV £ a ) n ; then if the categories were n' in number we should have to reduoe »' 

by four in ascertaining their P’s. 



362 Experimental Discussion of the (x*. P) Test for Goodness of Fit 

accordingly the frequency of the normal curve could never have been obtained from 
it: that is, the normal curve would be an impossible sample* We must treat the 
graduated basic sample as our parent population, and ask what is the probability 
that the raw basic sample could be drawn from a parent population with the 
relative frequencies of the graduated sample. How we have obtained that parent 
population, whether by guess-work or by moments in any number, does not come 
into the problem. Here is a parent population, and here again is a sample which 
could be drawn from it: what is the probability P of samples like the present or 
more remote ? It seems to me that this is a reasonable problem, and that it is the 
problem we usually desire to answer in curve fitting, rather than the question of 
the comparative fit of two curves determined by the same number of moments. If 
so, the process by which we have reached our parent population is a matter of 
indifference, we have no restriction of our degrees of freedom, beyond the size of 
the sample. I get my graduated basic sample—not to test it against other processes 
of graduation—but to see how far it may replace the unknown parent population 
from which the raw basic sample was drawn, and I do this bytesting 100 experimental 
samples from a known parent population against that population and against the 
graduated basic sample to ascertain what is the relation of their ^ 2 s. 

Table IV gives the Goodness of Fit of the Raw to the Graduated Basic Samples 
in the cases of the eight basic samples. It will be seen at once that the fit is good, 
and it should be, because our samples are owing to the equality of moments good 
ones—but the fit is nothing like as good as the fit of the graduated basic samples 

TABLE IV. 

Fit of Raw Basic to Graduated Basic Samples. 


Size of 
Basie 

Haw Basic to Graduated 

Basic Samples 

Baw Basic Samples to Parent 
Population 

Sample 

X’ 

P 

Order 

X s 

P 

Order 

650 

7*2899 

•9216 

1st 

8-2890 

•8725 

1st 

300 

13*0665 

-5214 

7th 

14*6609 

•4024 

7th 

150 

8-8009 

•8427 

3rd 

9-8821 

•7703 

2nd 

105 

10*5628 

-7193 

4th 

10-1716 

-.7491 

4th 

60 

18-8529 

1711 

8th 

19-6397 

•1427 

8th 

45 

7-3810 

•9174 

2nd 

9*8996 

•7691 

3rd 

30 

11-9425 

•6109 

6th 

12-3442 

•6788 

6th 

15 

12-1290 

•5960 

6th 

10-8809 

•6951 

5th 


to the true parent population. This table shows results of considerable importance. 
The two orders are very nearly the same, the only interchanges being that of the 
2nd and 3rd into the 3rd and 2nd, and of the 5th and 6th into the 6th and 5th. 
As we see the Basic Sample of 300 was a bad one and that of 45 an especially 
good one. No doubt had we been able to take a large number of basic samples 










Kart, Pearson 


363 


of 300 and of 46, these results would haye been averaged out, and samples of 300 
and of 46 put in more appropriate order. Table III shows us that for practical 
purposes there is very little to choose between the fits of all eight graduated basic 
samples to the parent population, we cannot place them in order without recalcu¬ 
lating the P) table to more figures; all however give us a fit measured by 
P > *9996. The same four stand at the top in both orders and the same four at the 
bottom. As a rough rule we may therefore say that the raw sample which fits best 
its own graduation fits best the parent population. In other words, if we are 
seeking the “best” out of a number of samples from an unknown population, that 
best will be roughly indicated by the degree of goodness of fit it bears to its own 
graduation. Most investigators would say: “Oh, but a sample of 300 must give a 
better representation ‘fcn unknown parent population than one of 461” It is of 
course true in the long run that we shall get better results from samples of 300 
than from samples of 46. But in the present instance we have a case in which an 
individual sample of 46, both in its raw (Table IV) and graduated form (Table III), 
is a better fit to the parent population than a sample of 300. Of course it is 
needful for both samples to be true random samples from the parent population, 
not in any way selected for graduation and they must be graduated by the same 
process. 

(5) Parent Population and Graduated Basic Samples tested against 100 further 
Samples of 30 drawn from the Parent Population. This is the main part of our 
experimental work, wherein we strive to determine the degree of accuracy with 
which the graduated basic sample can be used as representative of the parent 
population. It is in its turn to be treated as a parent population and the 100 
samples from the original parent population will be tested by their x* from the 
latter population and by their x'* from the graduated basic sample as a spurious 
or step-parental population. 

We shall investigate (a) the mean difference of x 2 ~ (b) its standard deviation 

ax*-* 1 4 and (c) the correlation of x* and As the range of and x 2 is very 

considerable, and the correlation tables could only be formed and published for 
fairly considerable subranges, these were taken as unity for and xf* To deter¬ 
mine the mean and standard deviation of x'*> a^al differences were taken 
and grouped in subranges of 0 2. Thus we find that mean (x 8 X 2 ) not exactly 
equal to mean x 1 — mean x' 2 as given by the correlation tables, nor 

^V-x' 1 = — 2 crx 

as given by the same correlation tables. The accordance however is good. Corre¬ 
lation Tables A—H tabulate the experience, and Table V gives the chief constants 
obtained in the manner described above. 

’ The regressions are approximately linear, and accordingly the constant 

V(l - rV*'«) 

as well as the regression coefficient of x' % 011 X* or ^x 4 iX 4 “^*' tr * 4 »W <r * 4 ^ ave keen 
added. 



364 Experimented Discussion of the (x\ P) Test for Goodness of Fit 

TABLE V. 

Constants of the Distributions of y? for the Parent Populations and of 
y* for the Basic Samples as Step-Parent Populations. 


Size of 
Basic 

From Distributions of x 8 - x' a 
in 0 a 2 intervals 

From the Correlation Tables A—H grouped for x 8 and x'* 
in unit intervals 

Sample 

Mean x* - x' s 

V-x'* 

Mean x ' a 


r x*, X ' 1 

<T*,i ~ ***X*,X' a 


600 

+ -018 (ii) 

•7889 (i) 

12*66 

4*6402 

*985,727 (i) 

•7812 

•986,646 

300 

- -996 (vi) 

1*6212 (v) 

13*61 

5*4474 

•979,674 (ii) 

1*0927 

1*166,107 

160 

- *398 (iv) 

1-3046 (iii) 

13*07 

4*9136 

•966,350 (iii) 

1-2639 

1*026,296 

106 

+ 1*938 (viii) 

1*6678 (vi) 

13*82 

5*6222 

•966,108 (iv) 

1*4255 

1*153,149 

60 

- *164 (iii) 

1*2521 (ii) 

12*90 

4*8017 

•964,567 (v) 

1*2669 

1*001,094 

45 

+ *017 (i) 

1-6738 (vii) 

12*71 

4*2264 

■934,983 (viii) 

14991 

*854,126 

30 

- *429 (v) 

1-5685 (iv) 

13*15 

5*1229 

•939,327 (vii) 

1*7573 

1*040,112 

16 

-1*280 (vii) 

1-9747 (viii) 

13*99 

6*7277 

•952,176 (vi) 

1*7501 

1*178,813 


For the Parent Population: Mean x 2 = 12 * 68 , 0 -^ 3 =4*6265. 


Now examining this table, it will be seen that the average difference between 
X* and x 2 is small, and that the variation in the difference is not great. The sixth 
column of the correlation coefficients shows how highly x 2 and x' 2 are correlated, 
the lowest correlation occurs with the sample of 45, but even this is greater than '93. 

TABLE VI. 

Regression Lines of x 2 on 


Size of 
Sample 

Begression Equation 

Size of 
Sample 

Begression Equation 

600 

x'*= 0 - 12 + -98865x*±o-63 

60 

x'*- 0-21 + 1-00109x“ 

±0*85 

300 

x' a - -l , 05+l'166llx*±0‘74 

45 

x '*= 1-88+ -85413x a 

±1*01 

150 

x'*= 0'06 + l-02630x 2 ±0'85 

30 

X'*=-0-04+ 1-0401 lx* 
X'**=-0-96 + 1-17881 x* 

+ 1*19 

105 

X ,2 = -0-80+1-15315x 8 ±0-96 

15 

±1*18 


In any case we can deduce x 2 fr° m X 2 > or X 2 fr° m X 2 g rea ^ er accuracy 
than we can find in human beings any character of the right side from a knowledge 
of it on the left side; for example, the length of a right thigh bone from a knowledge 
of the length of the same bone on the left. I think any one who studies the correlation 
coefficients, the correlation tables and the regression lines will agree that as far as 
determining whether a sample B comes from an unknown population A , of which 
we have only a sample G, say of 60 or upwards, we shall rarely be wrong in our 
diagnosis, if we ask whether B could have been drawn from the sample G after 
graduation. That is to say, in the value of x* 





Karl Pearson 


365 


we replace the unknown ft/s series by t^’s, where the latter are drawn from a 
graduated sample of the unknown population. 

There is however a point to be noticed here. The distribution of given by 
V = y<>e~* x * depends on the n/s being the means of the n/s, and this is not 

correct, although a comparison between p.p.f. column under “Sample of 30,” with 
the r.g.f. columns in all the samples in Table I will show that the differences are 
not large. Now the mean in samples with v»15 categories ought to be 
v— 1 = 14, and <r*» = V2 (v — 1)« 5*2915. A consideration of columns 4 and 5 of 
Table V shows that the x' v & from Graduated Basic Samples only approach these 
values very roughly. The mean of their means for is 13*2389 and their mean 
<Tx * = 5*05025. It migi f be plausibly argued that this is due to not being n # , 
but when we come to the sampling from the parent population where v $ has 
actually been used we find instead of a better correspondence a worse one, namely, 
mean ^ = 12*68 in place of 14, and cr x t=s 4*6265 in place of 5*2915 *. It seems 
impossible therefore to attribute the divergence to h 9 not being equal to ri tf . There 
are two explanations which may account for the non-fulfilment by mean x* and <r*i 
of the theoretical values. First we have used 15 categories throughout, and this 
seemed a necessity for purposes of comparison; further, the categories were not 
unsuitable when we were comparing the larger samples directly with the parent 
population or with one another in their raw and graduated forms. But it is not so 
satisfactory when we test the parent population or the graduated samples against 
the samples of 30, as the theoretical values in some of the categories become small. 
Some experimental work, however, seems to indicate that not very great effect is 
produced by the small categories. The second point is that it is due to the 
approximate nature of the curve y *= j/o^ xa (ix 2 )^ (t>-3) * 

While the true ^ = 1 / — 1, the deduction of the variance of as 2(v —1) 
depends on the above curve being applicable, which actually it is not. There is a 

limit to the value of x 2 which is, I think, xi 2 * where N is the size of 

the sample, and the least relative cell frequency of the parental population f. Thus 
in order to get the customary equation for x 2 with an infinite range we require to 
make ni as small as possible, but to do so is to disregard the principle that fi t must 
be relatively large compared to N in order that we may replace the binomial by a 
normal curve. We are thus thrust on the horns of a dilemma. If we say that 
(A + -foY* is the most skew binomial that can reasonably be represented by a normal 
curve, and we take n t «■ N , then xi 2 — and for a small sample we may doubt 
whether it is legitimate to treat this as an infinite range. 

# The divergences are of course not impossible, but they point in one way; actually they are for 
difference of means 1*82 =fc *86 and for difference of standard deviations *665 ±*252 approximately. 

f In the case of the 100 samples of 80 compared with the parent population, xi 8 ~1290, and this 
range might be treated as approximately infinite for our purposes, but to obtain it we have infringed the 
condition as to replacing binomials by normal curves. 



360 Experimental Discussion of the (\\ P) Test for Goodness of Fit 


The actual value of the variance of when in Equation (i) the = Np t 's are 
the true mean of the n 9 f s, is given by* 

D(i-y)-y +s ©.<™>- 

Hence the usual value 

ffV s 2 (t> — 1) 

may be modified in two ways: (a) if the sample be small, and v be not small, the 
negative term iP/N may be by no means negligible; for example, if v»15 and 
#»30, the term = 7*5, which cannot be neglected as compared with 27*07; 

(b) on the other hand the term S ^ is additive, and if we are dealing with a small 

sample and with a fkir sized v, this may be considerable. For example, in the case 

of our experimental parent population reduced to a size of 30, S ^ = 10*597, so 

that the theoretical variance in that case is 3017 instead of 28, giving <r x * = 5 493 
instead of 5*292, and differing still more from the observed 4*627, the deviation 
being 3*4 times its probable error. Even with a sample of 50 in 10 categories so 

chosen that no category contains less than 4, and thus S reduced to a minimum, 

the term v*/N will still be 2, and this is not negligible as compared with 15*68. 
The use of oV* 2(p — 1) and consequently of (vi) in the case of small samples is 
certainly to be deprecated. 

I again hazard the suggestion that the better distribution of in such cases 
is to be found from the curve 

y = yo(h7fy ,1 (1tXi t -W) Pt .(viii), 

where x* — N(N— 

and %* = (»-!). a»,— 2(t;-l)(l-l)-^+s(i). 

The Table of the Incomplete B-Function will provide the requisite probability P 
for a given x*. 

We can see easily how (viii) passes into a Type III curve if X i * be large. In that 
ease we have approximately 


•(IS). 


** o*x* \ 

* Biometrika , Vol. xx, Equation (xii), (or the value from a limited parent population, and n t not 
necessarily the mean of n if and as above for the oase of n a m mean of a # . 






Karl Pearson 


887 


where 


X 


1_I £ I 1 

J7 2(v-l)JV 2 (v — 1) 


flf 



Hence 

or since xi* is large we reach 


y « 3/o' (ix*) 


fi_.» x 

V ix,V 


•( x ). 


-*tl 


y =yo' (4x*) x , 

_,x*r 

r\> \ 1 » \ 


•(xi). 


yo 


'(»*)■ 


It is accordingly!^ which is approximately given by a Type III curve, and 

K, \ 

V \ 

the power is not ^ (v - 3), but J —-1; the probability P will be easily calcu- 

A» * 


luted from the Incomplete Y-Fwnction Table. In the case of the samples from the 
parent population of this paper \ = 1*07737, and since = 1220, the transition to 
(xi) is reasonably legitimate. But the usual ^ will be in error to about 8 °/ 0 - 


The justification for (viii) lies in the fact that it gives the true start and range of 
the x 2 curve as well as its true mean and variance. It will probably account with 
considerable accuracy for the binomials not closely following a normal distribution; 
and with the Tables of the Incomplete T- and B-Functions the P corresponding 
to (viii) or (xi) may be obtained as quickly as from Palin Elderton's Table. 


The whole subject is worthy of further experimental investigation, for if my 
conjecture as to the approximate accuracy of (viii) and (xi) be verified, the use of 
the x 2 test could be extended to small samples and small cell frequencies, which 
are not suitable in the case of the ordinary (^ 2 , P) process. 

The fundamental experiment of the present paper is in part intended to 
illustrate the need for widening the nature of Equation (i). No discriminating 
investigation could be based on the present data without increasing much beyond 
100 the number of samples taken. ’ 


(6) Actual Comparison of the P's from a Parent Population and from a 
Graduated Sample from that Population ♦ My original intention was to publish side 
by side the P’s determined from the j^s and j^’s of the Parent Population and the 
eight Graduated Basic Samples. But the large amount of labour and of printing 
involved in computing and publishing 900 P’s induced me to confine my attention 
to a single sample, which I have taken a good way down the list to indicate that a 
relatively small basic sample, say 50 to 100, if graduated, will provide a reasonable 
indication of whether further samples do or do not belong to the unknown parent 
population. The 100 values of P for the 105 Basic Sample are given in Table VII. 

The problem turns here on how many samples which belong really to the parent 
population would have been rejected by the graduated basic sample. Suppose first 
we take the 2°/ 0 standard. No. 82 would be rejected by both P and P' tests if 





368 Experimental Discussion of the (x\ P) Test for Goodness of Fit 

TABLE VII. 

Comparison of the Probabilities of 100 Samples drawn from a Parent Population 
and again supposed to be drawn from a Graduated Basic Sample of 105. 


Sample 
Index No. 

P as drawn 
from 
Parent 
Population 

P' as drawn 
from 

Basic 

Sample 

Sample 
Index No. 

P as drawn 
from 
Parent 
Population 

P / as drawn 
from 

Basic 

Sample 

Sample 
Index No. 

P as drawn 
from 
Parent 
Population 

P' as drawn 
from 

Basic 

Sample 

SJ 

QQ J 

P as drawn 
from 
Parent 
Population 

P* as drawn 
from 
Basic 
Sample 

i 

*763 

•764 

26 

•720 

•710 

51 

•815 

•804 

76 

•827 

•710 

2 

*204 

•125 

27 

•609 

•581 

52 

•679 

•637 

77 

•303 

*200 

3 

•681 

•684 

28 

•462 

•434 

53 

•704 

•718 

78 

•209 

•046 

4 

*232 

•163 

29 

•763 

•700 

54 

•990 

•971 

79 

•697 

•484 

5 

•202 

•281 

30 

•775 

•820 

55 

■579 

•376 

80 

•846 

•872 

6 

•421 

•221 

31 

•819 

•718 

56 

•098 

•024 

81 

•630 

•546 

7 

•846 

•872 

32 

•441 

•453 

57 

•832 

•746 

82 

•019 

•007 

8 

•942 

•932 

33 

•054 

•066 

58 

•439 

•526 

83 

•469 

•270 

9 

•245 

•165 

34 

•310 

•283 

59 

•932 

•962 

84 

*445 

•229 

10 

•135 

•057 

35 

•460 

•360 

60 

•128 

•041 

85 

•712 

*553 

11 

•688 

•725 

36 

•567 

•409 

61 

•382 

•373 

86 

•973 

•982 

12 

•918 

•879 

37 

•320 

•307 

62 

•117 

•026 

87 

•774 

•830 

13 

•900 

•746 

38 

•194 

•240 

63 

•381 

•461 

88 

•822 

•766 

14 

•060 

•071 

39 

•651 

•615 

64 

•105 

•019 

89 

•788 

•721 

15 

•779 

*657 

40 

•192 

•087 

65 

•879 

•873 

90 

•776 

•728 

16 

•309 

*169 

41 

•848 

•780 

66 

•456 

•305 

91 

•708 

*499 

17 

*747 

•625 

42 

•279 

•161 

67 

•889 

•937 

92 

•902 

•873 

18 

•987 

•976 

43 

•999 

*999 

68 

■866 

•836 

93 

•359 

•233 

19 

•390 

•414 

44 

•285 

•404 

69 

*869 

•894 

94 

•803 

•644 

20 | 

•857 

•854 

45 

•297 

•338 

70 

*743 

•721 

95 

•174 

•072 

21 

•567 

•598 

46 

•575 

•492 

71 

•437 

•383 

96 

•711 

•722 

22 

•286 

•167 

47 

•106 

•060 

72 

•180 

•112 

97 

•284 

•241 

23 

•860 

•895 

48 

•586 

*678 

73 

•852 

•800 

98 

•477 

•310 

24 

•829 

•791 

49 

*945 

•949 

74 

•622 

•424 

99 

•773 

*778 

26 

•666 

•652 

50 

•859 

•884 

75 

•231 

•163 

100 

*763 

•754 


an isolated sample. No. 64 would be retained as a sample of the parent population 
and rejected as a sample from the basic sample population, had it occurred as an 
isolated sample. Actually it or something worse might be expected to occur twice 
in 100 samples. Thus dealing with a 2 °/ 0 limit and an isolated sample we should 
have made an error once in a hundred times in rejecting a sample from the parent 
population and twice in 100 times if we used the basic graduated sample in place 
of the parent population. If we used a 5 0 / o level, No. 82 would be the only 
sample rejected on account of its P value in the case of the parent population, 
while Nos. 56, 60, 64, 78 and 82 would all be rejected on the basic sample test. 
The reader might hastily pass to the conclusion that the basic sample does not 
effectively represent the parent population. But the conclusion is rather that the 
present sampling is too favourable. A 5 0 / o level means that there are five cases 
in the 100 below it, the parent population shows only one , while the graduated 
basic sample actually records five. 

We may consider the matter from another standpoint. The distribution of the 
probability integrals of any continuous curve is a rectangle, every probability 










Karl Pearson 


* 860 


between 0 and 1 being equally likely •. Accordingly the distribution of P and P' 
should be linear. Dividing into 10 groups we have the following scheme: 

TABLE VIII. 


Distribution of P and P '. 


Probability= 

* 00 
*10 

•JO 
* 20 

•20 
• 80 

■so 

■40 

■40 

•so 

• 50 
•60 

•60 

•70 


•80 

*90 


Total 

Expected. 

10 

10 

10 

io! 

10 

10 

10 : 

10 

10 

10 


Parent Population, P 

6 

9 

11 

8 

10 

5 

9 

17 

16-5 

9-5 


KHBKfiHSHI 

9 

BH 

9*5 

6 

11 

6 

8*5 

16-5 


7 



In both cases we find a redundancy of rather favourable samples. 

For the Parent Population: “ 15*65, P = *075. 

For the Basic Sample of 105: ^' a = 12*40, P f =» *193. 

The odds in the first case are about 12 or 13 to 1, and in the second case about 
4 or 5 to 1. Thus the Graduated Basic Sample gives the more reasonable result. 
Both are possible in the single isolated trial. 

The reason for there being less correspondence between the P and P' for the 
series of 100 samples lies in the low standard deviation of the 105 Basic Sample; 
see Table II. It is the lowest of all eight samples. Hence in the case of rare 
individuals being drawn from the Parent Population, they would be still rarer in 
the case of the 105 Basic Sample, and accordingly what is a rare sample from the 
standpoint of the Parent Population will be still rarer in the case of the Basic 
Sample, i.e. when P is small, P' will be still smallerf. 

The reader may ask for some evidence that the Normal Parent Population and 
the Basic Sample would correspond in the same manner when the samples tested 
were drawn not from the former but from an entirely different population. For this 
purpose I took a Rectangular Population, and not to protract matters too severely 
took only ten samples of thirty from it. The values of P for three assumed parent 
populations are given in Table IX. It will be seen at once that the values of P 
for both the Normal Parent Population and the Graduated Basic Sample of 105 
as parent population are on the whole strongly against the samples from the 
Rectangular Population being their offspring; the Graduated Basic Sample is even 
more strenuous than the grand-parental normal curve population, owing to the fact 
of its smaller standard deviation. Naturally the bulk of the contributions to ^ come 

* Thus Bayes’ theorem applies accurately to such distributions of P, all chances being equally likely. 

f This reduction of the standard deviation will be somewhat modified by the shifting of the mean, 
whioh is, however, nearly the smallest shift of the series, and this shift, if it compensates for the reduced 
standard deviation effect at one tail, will emphasise it at the other. 








370 Experimental Discussion of the (x*. P) Test for Goodness of Fit 


TABLE IX. 

ioinparmn of values of P from Samples of 30 drawn from a Rectangular Population, with their Pare 
Population , the Normal Curve Parent Population , and the Oradmted Basic Sample of 105. 


Index Number of Sample from a Rectangular Population 


Chance of ooourrenoe 


if drawn from: 

I 

II 

III 

IV 

V 

VI 

VII 

vm 

IX 

Normaf Curve as Parent Popu-1 
lation 

Normal Curve of Basic Sample' 
of 105 as Parent Population _ 

*6063 

*7418 

•5607 

•8311 

•0054 

•0010 

•5265 

•0060 

•0045 

•0193 

<•000,0005 

<•000,0005 

•0458 

<•000,0005 

<•000,0005 

•2562 

<•000,004 

<000,005 

•6063 

•0239 

•0072 

•4497 

•0024 

000,003 

•3134 

•0011 

•000,170 


from random variation in the extreme categories of the rectangle*. We may I think 
conclude that the Parent Population and the Graduated Basic Sample will give the 
same sort of judgment with regard to a population differing from either of them. 

But having said this, we examine our table further and begin to realise the 
weakness of small samples. Two true samples (IV and V) out of ten from the rect¬ 
angle would be rejected on the P = ‘05 basis, and one (IV) on the ‘02 basis. On the 
other hand two samples out of ten would be accepted as genuine samples of the 
Normal Parent Population, and one out of ten as a genuine sample of the Basic 
Sample of 105 on the ‘05 basis, and two from both on the *02 basis. Indeed, 
Sample I is a better sample from a normal population than from a rectangular 
population, its true parent. 

Of course such anomalies will occur, but if they can occur in ten samples from 
such very different distributions as a normal curve and a rectangle, must we not 
be somewhat anxious whether they will not occur, and more frequently occur, when 
we compare two small samples and assert identity of origin ? The samples may 
really have arisen from two wholly different populations, but far more accordant than 
a rectangle and a normal curve! 

To illustrate this point I will deduce, by aid of the formula f 

tf^SNN'zUd -8%=$, if N»N' t 

/#“r/i JitJs 

* The sampling was done by putting into a box 10 of each of the letters A, If, C...M , N, O on tiokets. 
A ticket was then drawn, its letter recorded, and it was returned to the box. The box had then its 
lid dosed, it was waved about, rotated and shaken, and then a second ticket drawn, and the process 
continually repeated until 800 tiokets had been recorded. The two exceptional samples (IV) and (V) arose 
from the laBt letter, O, occurring seven times, and the last but one, N, seven times. Observation showed 
that it was not through faulty shaking, as it was not the same ticket repeating itself. Further, it could 
hardly be due to “ clustering ”, as the tickets had been introduced into the box in a manner which avoided 
this, and tiokets of the same letter did not follow in approximate succession. 

t Biometrika, Vol. vm. p. 252. 




Karl Pearson 


871 


the probability of our 7 raw basic samples of sizes 15, 30, 45, 00,105,150 and 
300 actually taken from a normal population being drawn from a rectangular 
population. 

Using the ungraduated raw samples, we have: 

TABLE X. 


Probability of Raw Samples from a Normal Population 
having a Rectangular Parent . 


Size of Samples 

15 



60 

105 

150 

800 

Values of P 

C*3 


•8306 

‘3244 

•0326 

•0051 

<•000,0006 


It will be clear from this table that even with parent populations so different as 
those here dealt with, the test is inadequate to discriminate between raw samples 
from these populations, if the samples have not sizes of the order 100, and even 
then not on the *02 probability basis. Safety may be said to begin between 105 
and 150, and if 50 or below be said to be “small” sample sizes, it is not dogmatic 
to assert that the x 2 * test ought never to be applied to such small samples. 

(7) Conclusions . An endeavour has been made in this paper to mark more 
clearly the distinctions the writer had in view in introducing and. <J> 2 into 
statistical theory and practice. 

(i) If X* be defined as ** 

then n 8 is in a succession of samples a constant and equal to the mean value of n # . 
If this condition be satisfied, x 2 is given approximately, but only approximately, by 
the curve 

where v is the number of cells = n! of Elderton s (^ 2 , P) table. 

The mean of )£ is v -1, but its true standard deviation is not V2 (v - 1) but is 
given by Equation (vii) above. It is suggested that either Equation (viii) or 
Equation (xi) will give a better value for the P corresponding to a given x*> using 
either the Incomplete B- or T-Funotion Tables, than Elderton’s P) Table, when 
N is not very large or any 1/n, is not negligible as compared to unity. 

(ii) It has been pointed out that the main use of ^ was intended to be the 
comparison of a considerable graduated sample of a parent population with further 
samples in order to test whether such samples were or were not likely to be 
samples from the parent population, only known through this graduated sample. 
It is shown from a series of experimental examples that the %' a s and P"s from the 
graduated sample are very highly correlated with the % 8 ’s and P’s from the parent 

# I mean the form of the x 9 test based on the distribution y » y 0 ^(ix 9 )*^’ 












372 Experimental Discussion of the (x a » P) Test for Goodness of Fit 


population, so that without frequent wrong judgment we may use the step-parental 
population in place of the unknown parent population. This amounts to using for 
n, the values found from the graduated sample population They still remain 
constant throughout the comparison with further samples, and the larger the size 
of the graduated sample the higher on the average will be the correlation between 
the x*’ s from the step-parental and true parental populations. 

(iii) Incidentally it is pointed out that even for small samples an immensely 
better P is obtained from a graduated than from a raw sample, and this even when 
the size of the sample is small. 

(iv) More than twenty years ago I gave a test for two samples of sizes N and 
N', categories n„ n,' being drawn from the same parent population with relative 
frequency of the sth category p,. It consisted in calculating 

, -v NN' Ur JV"j 


there being v categories in both samples, and then applying the (x a , P) table. 
Hero p, is supposed throughout the farther sampling to be a constant. 

If the parent population be supposed unknown, I suggested that the best value 
available was p, = (n, + n,')l(N + N '); this would not be very accurate for small 
samples. 


In this case 


NN 


, (n, _ n/\ a 
\N N’j 


n 8 + n 8 


But it has been frequently overlooked that in adopting this value of % 2 , we must 
in measuring P remember that in further samples the denominator n f + is 
supposed to remain constant, while we vary n 8 and n 8 in the numerator, otherwise the 
distribution of % 2 on which the (x*> P) table is based is incorrect. The whole matter 
has in my opinion been rendered unnecessarily obscure by writing the two samples 
in the form of a spurious biserial contingency table. This is said to have 2v cells, 
and to lack v 4-1 “degrees of freedom.'' If the first pair of samples gives a contingency 
table the second will not, and from this manner of approach we lose sight of the 
true difference between a real and a spurious contingency table. In the former 
all the constituents of the marginal totals are free, and the only limitation is the 
total size of the sample in the table. 


(v) If we, however, write % 2 in the form of 

‘ N ) <£ *= & XT + ^ -W- 


(N+N')<? 


*= 1 


VT5' < "' + " ;) 


WTT' ( w ' +ro *') 


we are not only giving ourselves double work, but are apt to forget that to get the 
approximate equation for we must consider n 9 4-»/ constant in the numerator, 



Karl Pharson 878 

which it is not if we take another pair of samples. There is nothing whatever to 
prevent our choosing 

(a*-Hi Y 

% NN' \n tn 

#*i N + N* n 9 + n 9 f 

as our measure of accordance of the two samples, making n M , n 9 vary in both 
numerator and denominator . But if this be done, the distribution of <f>* is not that 
of x*/(JV+ N'), and it cannot be deduced from the (%*, P) table. Even approximate 
values of the mean and variance are complicated, and the experimental study of the 
distribution of <f> 2 has only been started by Professor Kondo’s recent paper. 

(vi) If we take a §p iiple, graduate it by aid of t moments and then compare it 
with any population, It is, apart from fractions of a unit, a possible sample from 
that population, and we are at liberty to look out P in the ordinary (^ a , P) table 
and judge where it stands among other possible samples of the same size and the 
same categories. We have not limited anything by obtaining our graduated 
sample by moments. When we do limit by moments is when we fit a series of 
curves to a given distribution by moments, the curve moments being in each case 
those of the given distribution. In such a case we compare the relative goodness 
of fit of various curves obtained by t moments and our degrees of freedom are reduced 
by t. An application of such limitation of degrees of freedom was made by me 
in 1915 and applied to the case of death-rates. It was shown in the memoirs con¬ 
cerned with this topic that in certain cases not only the degrees of freedom, but 
the value of with which the table was entered might need modification*. Another 
case of the modification of both v and )£ to get a better measure of P is indicated 
in this paper: see pp. 366—367 above. 

(vii) We have also seen in this paper that it is still probably legitimate to 

calculate P from ^ when, owing to the smallness of the sample and of its cate¬ 
gories, the distribution y « is no longer accurate. In such cases 

we are thrown back on frequency curves which are generalisations of the usual 
j^-curve, and which can be integrated by aid of the Incomplete V-Function and the 
Incomplete B -Function Tables . There is here a field for much experimental work of 
a useful kind. 

* Biometrika , Vol. xi, pp. 145—184. 


Biometrika xxxr 


24 



TABLE A. 

X* from Basic Sample of 600 and from Parent Population . 
Parent Population ^ 


374 Experimental Discussion of the (x\ P) Ted, for Goodness of Fit 


■ 

IHHBB 

§ 

88—JU8 

1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 II 1 1 1 - i 

r*< 

18—98 

1111 M 1111 M 11 ii 11111111111 ! 

1 

93—98 

1 1 1 i 1 II 1 I II 1 II II 1 1 III II M II 

1 

ss—ts 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

1 

te—sg 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 III — 1 1 1 1 1 

94 

88—88 

1 1 1 1 i 1 1 II 1 1 II II 1 1 1 1 1 1 1 1 1 1 1 1 

D 

88—18 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 l~ 1 1 1 1 1 M 1 

M 

18-03 

1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 l« 1 i 1 1 1 1 1 1 

II 

08—61 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 - 1 II 1 1 II 1 1 1 

s 

6T SI 

i ii111111111iii«ir111111 

L*J 

M—LI 

1111111111111*-1 - 1111111111 

D 

U ~ 91 

11ii1111ii 111 11111111111 

00 

9T—9T 

1111111111111— 1111111 ii 111 

on 

21—\tl 

111111111111-* 1111111111111 

00 

n—ci 

1111111111««-111111111 m i ii 

IO 

81 81 

11111111111111111111111 

i a 

81— IT 

1111111111111111111111 


tl—01 

1111111 11111 ii 1111 ii 1111 

c& 

01—6 

111111«®®111111111111111111 

kO 

6—8 

Mil I'**"' 1 1 1 1 1 1 1 1 1 1 ! 1 1 1 1 1 1 1 1 

04 

r-4 

8—l 

1 1 1 l«« II 1 1 1 1 1 II 1 1 1 1 1 l 1 M 1 1 1 

B 

1-9 

1 1 1 1 « 1 1 1 1 1 1 1 II 1 1 1 1 Ml 1 1 1 1 1 1 

94 

9—9 

1 1 l~ 1 1 1 1.1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

B 

9—t 

1 l~~ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

94 

t -8 

1- 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 

B 

8—8 

1 1 1 1 1 1 1 II 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 

B 


* «l » tl «»« 1 5J 5 5 is I 14 si 53 ii i S5 2 

I 


*,X ejdnj'sg opngg 










































TABLE B. 

X* % from Basic Sample of 300 and from Parent Populatu 

Parent Population x* 


Karl Pearson 


375 


X 

| ** fN « m 

pH 

4 

15 

5 

4 

7 

7 

4 

4 

7 

4 

2 

2 

3 

3 

2 

3 

1 

8 

gg—ig 

i i i ii 

77 

i II II II II II II II II i II 1 1- 

- 

18—98 

i i i i i 

11 

II II 1 II 1 II II 1 II 1 II II II 1 

1 

98—98 

Mill 

11 

II II 1 II 1 ll II II II II l l ll 

1 

9g—tg 

Mill 

i 1 

II II II 1 II II II II II M II 1 

1 

tg—gg 

Mill 

1 1 

M II II II II 1— II II 1 II II 

<N 

88—88 

II i M 

1 1 

II 1 II II II II II II 1 1 II II 1 

1 

88—18 

Mill 

1 1 

II II II II 1 1 II 1 Ii 1- II II 1 

pH 

18—08 

Mill 

1 1 

II 1 1 II II I'll 1- 1- l« II II 1 


08 -ex 

Mill 

1 1 

1 II II 1 II II II- II 1 II Ii II 

fH 

61—ST 

Mill 

1 1 

II 1 II I— ll"« II 1 II II II 1 

CD 

SI—LI 

1 II II 

1 1 

1 II 1 II l-«II II 1- 1 II II II 

Tt< 

11—91 

Mill 

1 1 

||||^|^«^«|||||||||||j 

00 

91—91 

1 1 1 II 

1 1 

II 1 II I-- II II II II II 1 II l 


91—fT 

Mill 

1 1 

1 1 1 """ 1 -- II II II II 1 1 II 1 

00 

tl—k'I 

Mill 

1 1 

II IIII 1 Ii II II II 1 1 II 

iD 

ST gl 

Mill 

1 1 

- l-«- II II 1 1 II II II II II 1 

»T5 

81—U 

Mill 

1 1 

l-«--- II II II 1 II 1 II II 1 1 1 

1- 

XT—OX 

Mill 

1 1 

1 10 1 «" II II II II II 1 II II 1 1 1 

as 

OX-6 

II 1 II 

1II II II II II II II II II II 

o 

—* 

6—8 

Mill 

»•-"IIIIII 11 1 II 1II 1 II II II 

01 

1-4 

8—1 

II 1 r 1 

l"H 

- 1 II 1 II II II Ii II II II II 1 


1—9 

Mil" 

1 1 

ll II II II II 1 II 1 1 II 1 1 II 1 

<N 

9-9 

II 1 - 1 

1 1 

II II 1 1 II II Ii II II II II M 

i—i 

s-t 

1 1 1 

1 1 

II 1 II 1 1 1 II II 1 II 1 1 1 II II 

CN 

t—8 

1- II 1 

1 1 

II 1 II II. II 1 1 II 1 II 1 II 1 II 

pH 

8-8 

Mill 

1 1 

II II 1 1 1 II II II 1 II II II II 

n 


*? Tt 77777777 ?'"T 7777^*?^ 

1 


** efdmvg owmg 


34—2 











































TABLE C. 

X % from Basic Sample of 150 and ^ from Parent Papulation. 
Parent Population 


376 Experimental Discussion of the (x’> P) Test for Goodness of Fit 


Totals 


8 

88—t* 

II 1 1 1 1 1 1 1 1 1 ! 1 1 1 1 1 1 1 II 1 1 1 II 1 1 - 

rH 

J&—98 

II 1 1 1 1 1 II 1 1 1 1 1 1 1 1 I! II 1 1 1 1 II 1 1 

1 

98—1*8 

1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

1 

9S—t$ 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ! 

1 

*T8 F8 

1 1 1 1 1 1 1 1 1 1 1 I 1 1 1 1 1 1 1- 1- 1 1 1 1 1 1 1 

CM 

F8” "88 

1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 

1 

88 18 

II 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 - 1 1 1 II II 1 


J 8 08 

1 i 1 1 1 1 1 1 1 1 I 1 II 1 1 1 1 --« 1 1 II 1 1 1 1 


08—61 

1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 - 1 1 II 1 1 1 1 1 II 

1*4 

61—SI 

1 1 1 1 1 1 1 II 1 1 1 1 -- 1 ---- 1 1 1 1 1 1 1 1 1 

CO 

81 — U 

1 1 II 1 1 1 1 1 1 1 1 1 «- 1 1 II - II II 1 1 1 1 1 

Ml 

11—91 

1 1 l II 1 1 1 1 1 1 1 1«-••« 1 1 1 1 1 1 1 1 1 1 1 1 

00 

9i -in 

1 1 1 1 1 1 1 II 1 1 1 1 — 1 1 1 1 1 II 1 1 1 1 1 I 1 

<N 

9i—tr 

1 1 1 1 1 1 1 1 1 1 1 I -•- 1 1 1 1 II 1 1 1 1 1 1 1 1 

00 

tl—gj 

1 1 1 1 1 1 1 1 1"- l«- 1 1 1 1 1 1 1 1 1 II 1 1 II 


FT—81 

1 1 1 1 1 1 1 1 ----- 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 

»o 

81—tl 

1 II 1 1 1 l-«« 1-- 1 1 II II 1 1 1 1 1 1 1 1 1 1 

1 

U—OJ 

II 1 1 1 1 1 «»-« 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 


OT—G 

1 1 1 1 l-"«®« 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 

pH 

6—8 

1 1 1 1 1-®»« 1 1 1 1 1 II 1 1 1 1 II 1 1 1 1 1 1 1 

09 

8—l 

till «— 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 


1—9 

M I-- 1 1 1 1 1 1 1! 1 1 1 1 1 1 1 1-1 1 1 1 1 1 1 1 

CN 

9—9 

9~t 

1 II - 1 1 1 1 1 1 1 1 1 1 1 1 1 1 | 1 1 1 1 1 1 1 1 1 1 

r-» 

1 I 1 « 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 i 1 1 1 1 i ! 

09 


1 - 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

a 

Q—8 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 II II 1 1 1 1 1 1 1 1 1 1 1 

a 



Totals 


8 ,X ©tdureg opreg 


Totals 
















































TABLE D. 

X 2 from Basic Sample of 105 and x* from Parent Population. 
Parent Population x* 


Karl Pearson 


877 


Totals 


I 

88-iz 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II M II II - 

pH 

iZSS 

1 1 1 1 1 1 1 1 II 1 1 11 1 1 1 1 II 1 II 1 1 111 1 

1 

98~88 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 II II 1 1 1 1 1 II 1 1 1 1 

. 1 

sz—ts 

1 1 1 1 1 1 1 1 1 1 II 1 1 II 11 1 1 1 II 1 II 1 II : 

1 

ts—ez 

II i II 1 1 1 1 1 1 1 1 II II 1 II O’ 1 II 1 1 II 1 


88 

1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 II 1 1 1 1 1 1 1 II 1 1 

1 

88—1$ 

II M 1 II 1 1 II II 1 II II 1 1 II 1 1 - 1 1 1 1 

w* 

rz—oz 

1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 — 1 — 1 I 1 

Tf 

08~6l 

iiiiiiiiiiiii i i i i i i i i-r ii i ii i 

pH 

61-81 

11 1 1 1 1 1 I'll 1 1 1 1 — 1 l*~~ 1 1 1 1 . 1 ! 1 1 

CD 

81—11 

iiiiiiiiiiiiii i i~« i I i i~ i ii i i i 


IT—9T 


00 

91—SI 

21—tT 

1111 ii 11111111 — i i 11111111 i 11 

CM 

1 1 1 1 1 1 1 1 1 1 1 ««•- 1 O’ 1 1 II II 1 1 1 1 II 1 

00 

fi—si 

II 1 1 1 1 1 1 1 1 1 l~~« | | | | | | | II | || 1 1 


81—81 

81—11 

1 1 II 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 II 1 


1 1 1 1 1 1 1 1 1 «« i ~ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 


ll — 01 

1 1 II 1 1 M --O’ 1 II 1 1 II 1 II 1 1 1 1 1 1 J 

a> 

f 

K* 

1 1 II 1 II ®^o* 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 


6S 

Mil-«®«- 1 1 1 1 II 1 1 1 1 1 1 II M 1 II 1 

CM 

PH 

8—1 

1 1 1 1 - 1 ® 1 - 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 

^f 

1—9 

INI— 1 1 1 1 II 1 II ! II II 1 1 II 1 1 M 1 

CM 

9—8 

1 1 1 - 1 1 1 1 1 II 1 1 1 1 1 1 1 II 1 1 II 1 1 1 1 1 

B 

Q—f 

1 II » 1 1 1 1 1 1 II 1 II II 1 1 1 1 II 1 1 1 1 1 1 

CM 

t—8 

mi mi i mini mi iiiii i in 

- 

8—8 

11111111111111111111111111111 

1 



Totals 


g X ojdowg oumg 





















X* from Basic Sample of 60 and from Parent Population . 
Parent Population x s 


878 Experimental Discussion of the (\\ P) Test for Goodness of Fit 



t * eiduivg oiwg 


































TABLE F. 

X* from Basic Simple o/45 and ^ from Parent Population. 

Parent Population 


Karl Pearson 


370 


1 

| | j | | 04 | | 

■ 8 
»■* 

88—ie 

1 1 1 1 1 1 1 1 1 I i 1 1 1 1 1 1 1 1 1 1 1- 1 1 1 


18—98 

II 1 1 II 1 1 1 II 1 1 1 M 1 1 1 1 1 1 1 1 II 

1 

98—98 

1 1 1 II 1 1 f II II II 1 1 1 1 1 1 1 1 1 1 i 1 

.1 

38—t$ 

1 1 1 II II 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

1 

te-88 

1 1 1 II 1 1 1 II 1 1 1 1 1 1 II 1 1 1 1-II 

04 

98—88 

1 1 1 II 1 1 II 1 1 1 1 1 1 1 I 1 1 II 1 1 1 1 1 

1 

88—18 

* |*| 1 1 1 I 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 


T8~08 

1 1 1 1 1 1 1 1 1 1 1 1 1 |« 1— 1 1 1 1 1 1 1 1 


08—01 

1 1 1 1 1 1 1 Mil 1 1 1 1 - 1 1 II ! 1 II 1 1 1 

r-i 

61—91 

1 1 1 1 1 1 1 1 1 1 1 1 1 —— 1 « l 1 1 M 1 1 

to 

91—11 

1 1 1 1 1 1 1 1 1 1 1- 1 l« 1 1 1 1 1 1 1 1 1 1 1 


11 -91 

||||||||l||| W rH^^^|^|||l||| 

oo 

91—91 

11111111111-11-11111111111 

9) 

91—tT 

1 1 1 1 1 1 1 1 1 1« l-««- 1 1 1 1 1 1 1 1 1 1 

00 

tT—gl 

1 1 1 1 1 1 1 1 1 « 1 1 <" 1 1 1 1 1 1 1 1 I 1 II 1 

*a 

91-81 

1 1 1 1 1 1 1- 1 -- 1 <" 1 1 1 1 1 1 1 1 1 1 II 1 

o 

81—11 

1 1 1 1 1 1 1 1 -*- 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

t- 

11—01 

1 1 1 1 1 l«®»-«- 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

a> 

01—6 

H M 1 1 ««»« 1 1 1 1 1 1 1 1 1 1 1 1 1 1 l-T 


6—9 

1 1 1 1 1 1 •*«■» 1 II 1 1 1 1 II 1 1 1 1 II 1 1 


2 

• 1 1 1 1 - <*»- 1 1 1 1 1 1 1 1 ' I 1 1 1 1 1 1 1 I I 

04 

1—9 

1 1 1 1 1 « 1 1 M 1 1 1 1 1 1 1 1 M 1 1 1 1 l'l 

9-9 

M 1 1 1 - 1 1 1 1 1 1 1 1 1 1 1 , 1 1 1 1 II 1 1 


9—t 

1 I-- II 1 1 II II 1 1 1 1 II 1 1 1 1 1 1 11 

04 


1 1- 1 1 1 1 II 1 1 1 1 1 ! 1 1 I II I 1 1 1 lj 


9—8 

1 1 1 1 1 1 1 1 1 1 1 ! 1 M 1 II 1 1 M II II 

1 



•g 

1 


** ©fdureg oiwg 






























TABLE G. 

X 1 from Basic Sample of 30 and from Parent Popidati 
Parent Population x* 


880 Experimental Discussion of the (x fl , P) Test for Goodness of Fit 



i,* ©idoreg oieug 















TABLE H. 

X*from Basic Sample of 15 and from Parent Population . 

Parent Population x % 


Karl Pearson 


381 


T 

BEHHBB 

8 

n—us 

111ii i ii 1111 11 m i i 111111111 111- 


i%—9% 

1 ! 1 1 1 1 1 1 1 II 1 1 1 1 1 I 1 1 II 1 1 1 1 1 1 1 ! 1 I 

1 

98—9% 

i 1 1 1 1 1 1 1 1 II 1 1 II 1 1 1 1 II II 1 1 1 M 1 1 1 

1 

irs—ss 

1 1 1 1 1 1 1 1 II 1 1 II 1 1 1 1 1 1 1 1 1 1 II 1 II 1 1 

1 

1 1 1 1 1 II 1 II 1 II II 1 1 II 1 « 1 1 II II 1 1 1 1 


88— 8% 

1 1 1 II 1 1 II 1 1 II 1 II 1 II M 1 1 1 1 1 1 1 1 1 1 

i 

88—18 

18—08 

l7i*l 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1- II II 1 


1 1! 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1- 1 - 1 1 1 1 1 


oz—fil 

II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1- 1 1 1 1 1 1 1 1 

- 

fit-81 

II 1 i 1 1 1 1 1 1 1 1 1 1— 1 1 1 1 1 1 1 1 1 1 II 

CO 

81—IT 

1111111111111111—11111-1111111 

mm 

U—91 

91—SI 

II 1 1 1 1 1 1 1 I 1 1- 1 •* 1— 1 1 1 1 1 1 1 1 1 1 1 1 

00 

1 1 1 II II II 1 1 1 I-1- 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 


•Jl ~tt 

1 1 II 1 1 1 1 1 1 l««« 1 - - 1 ! 1 1 1 1 H 1 1 1 1 I I 

00 

ft—81 

MSI 

1 1 1 1 1 1 1 1 1 1 1 1-*— 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

o 

1 1 1 1 1 II I - 1 1 1 1 1 1 1 1 1 1 I 1 1 1 1 1 1 1 1 


81—IT 

1 i 1 1 1 1 1 1 1— 1 - 1 1 1 1 II 1 1 1 1 I 1 1 1 1 1 1 

B 

TT—OT 

1 1 I 1 1 II l« —« 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

o> 

OT-G 

1 1 1 II 1— 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 

to 

*-« 

6-8 

1 1 1 1— 1 1 1 1 II 1 l 1 1 1 1 1 1 1 1 1 1 1 1 1 1 


8—1 

1 1 1 1- l<" 1- 1 1 1 1 1 1 1 1 1 1 1 i 1 M 1 1 1 1 | | | 

''t 

1—9 

1 II l« 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 II 

09 

9-9 

1 II - 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 


s—t 

1 1 l« 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 

04 

t~8 

1- 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

fh 

8—8 

111111iii111 r11111 i i i i 11 ii i ii i i 

B 


7 77*77^7777 77 7777^1*^^^^^®?^*? 

< 1 »►» * 01 3 5 SS S3 J 8 55 fcj i sis 8 tl l i liU i i $ 8 ? 

1 


©jdrasg oiB*g 






















































ON THE DISTRIBUTION OF THE CORRELATION 
COEFFICIENT IN SMALL SAMPLES* 

By PAUL R. RIDER, Washington University, Saint Louis. 

It was the original purpose of this study to attempt to discover the effect upon 
the distribution in random samples, particularly in small samples, of the product- 
moment coefficient of correlation, r, when the samples are drawn from a non-normal 
instead of a normal population. In Part I the results of sampling from certain popu¬ 
lations which differ greatly from the normal are given, also the results of sampling 
from a normal population having a high degree of correlation. As the sampling was 
done experimentally, it was necessary to deal with discrete populations. This opened 
up the question of the effect of grouping upon the distribution of r, a question which 
is investigated in Part II. 

I. Sampling from Non-Normal Populations and from a Normal 
Population having High Correlation. 

Description of the Populations sampled . 

The populations sampled will be termed rectangular, triangular, and normal, and 
will be designated by R , T, and N, respectively. 

Rectangular Population . The rectangular population may best be characterized 
by its correlation table, which is composed of 10 x10 compartments, each with the 
same frequency. That is, there is equal probability, in random sampling, of obtaining 
any pair of values within a limited (rectangular) region. Obviously the correlation, p, 
in such a population is zero. If x and y are the variates, then for the marginal 
distributions we have fit ( x )« fix (y) - 0, fit (x) - fit (y)»1*7757. (The dots indicate 
a repeating decimal.) For the corresponding continuous bivariate distribution, 
p 5=5 0, fix (#) • fit (y) “ 0, fit (x) — fit (y) = 1*8. The frequency surface is a rectangular 
parallelopiped, or, with suitably chosen units, a cube. 

Triangular Population . The correlation table of the triangular population is also 
composed of 10x10 cells, but the frequencies in all of the cells on one side of 
a principal diagonal are zeros, the frequencies in the remaining cells having a constant 
value different from zero. It is found that 

P-i, /9iW = /9i(y)“0-326. A(«)-A(jr)-fr8ft. 

* This investigation was made possible by assistance from a grant made by the Rockefeller 
Foundation to Washington University for Research in Scienoe. The writer wishes to express his sincere 
thanks for this grant, and also to make grateful acknowledgment of valuable criticisms, suggestions, 
and assistance given by Dr Egon S. Pearson. 



Paul B. Rider 


888 


For the corresponding continuous distribution the frequency surface is a right prism, 
with a right triangle for a base. Ita constants are 

/8i(*)-/8i(y)-0-32, A(«) = /8*(y)-2'4. 

Normal Population. The normal population sampled is the one shown in Table I. 
Its frequencies were calculated from tables of volumes of the normal surface*. It 


TABLE I. 


Normal Correlation Table. 


Totals 


Totals 


— 

— 

— 

— 

1 

29 

32 

62 

— 

*«_ 

— 

6 

223 

349 

29 

606 

— 

— 

8 

618 

1566 

223 

1 

2416 

— 

5 

618 

2582 

618 

6 

— 

3828 

1 

223 

1666 

618 

8 

— 

— 

.2416 

29 

349 

223 

6 

— 

— 

— 

606 

32 

29 

1 

— 

— 

— 

— 

62 

62 

606 

2416 

3828 

2416 

606 

62 

9996 


p«0*9 with Sheppard’s correction. 

0-83 without Sheppard’s correction. 


represents a population in which the coefficient of correlation is 0*9. On account of 
the grouping, the actual coefficient as computed from the table ia 0*83, but when 
Sheppard’s correction is applied to the two standard deviations involved in the 
denominator of the coefficient of correlation, its value is 0*9009. The values of fix 
and fit for the marginal distributions are 0 and 2*9390 respectively. With Sheppard’s 
corrections, fit = 2*9361. 

It was suggested by Karl Pearson, in a letter to the writer, that it would be 
desirable to test by actual sampling, whether observed values of r actually follow 
the theoretical distribution when there is high correlation and grouped frequencies 
in the sampled population. It was the object of this particular experiment to test 
the theory. 

Results of Sampling. 

The sampling was effected by the use of Tippett’s numbers f. One thousand 
samples of 5 pairs each were obtained from each of the populations R, T, and N. By 
clubbing these together in the case of T and of N , five hundred samples of 10 pairs 
each were obtained from each of these two populations. The observed distributions 
of r are given in Table II. 

# 

* See Tablet for Statiitician* and Btomtrician *, Part 11, Table VIII, pp. 105—106. 
f L. H. 0. Tippett, Bandom Sampling Humbert (Cambridge University Press, Tracts for Computers, 
No. 15). 



384 Distribution of the Correlation Coefficient in Small Samples 


TABLE II. 

Distribution of r in Samples of 5 and of 10. 


r 

Frequencies 



*1. 


*.0 

•95 to 

1*00 

9 

48 

3 

235 

34 

•90 „ 

•95 

10 

59 

9 

175-5 

97 

•86 „ 

*90 

17 

49 

11 

146*5 

92 

•80 „ 

•85 

20 

62 

17 

83 

86 

*75 „ 

•80 

19 

64 

28 

92*5 

46 

•70 „ 

*75 

22 

59 

27 

33-6 

56 

•85 „ 

•70 

35 

64 

40 

36 

23 

•60 „ 

•65 

21 

47 

46 

65 

21 

•55 „ 

•60 

88 

55 

54 

12 

15 

•50 „ 

•55 

23 

46 

31 

27 

15 

'45 „ 

•50 

23 

53 

36 

14 

4 

•40 „ 

•45 

27 

50 

36 

21 

3 

•35 „ 

•40 

33 

38 

38 

2 

3 

•30 „ 

•35 

30 

25 

19 

14 

1 

•25 „ 

•30 

27 

40 

22 

4 

_ 

•20 „ 

•25 

26 

27 

22 

4 

2 

•15 „ 

•20 

33 

31 

15 

8 

_ 

•10 „ 

•15 

32 

14 

22 

4 

_ 

•05 „ 

*10 

33 

20 

6 

1 

_ 

•00 „ 

•05 

24 

16*5 

2*5 

4 

-- 

•00 „ 

-05 

33 

17*5 

4*5 

4 

— 

- -05 „ 

-•10 

29 

15 

2 

— 

1 

- *50 „ 

- *15 

32 

4 

1 

— 

1 

-•15 „ 

-•20 

28 

9 

— 

4 

_ 

- -20 „ 

-•25 

29 

9*5 

3 

4 

_ 

-•25 „ 

-•30 

28 

11-5 

3 

2 

__ 

-•30 „ 

-•35 

24 

11 

— 

— 

_ 

-•35 „ 

- -40 

28 

5 

— 

— 

_ 

-•40 „ 

-•45 

27 

8 

— 

4 

_ 

-■45 „ 

-•50 

41 

6 

— 

— 

_ 

“ -50 „ 

-•55 

24 

5 

— 

— 

. 

-•55 „ 

-•60 

28 

6 

— 

— 

_ 

-•oo „ 

-•65 

29 

4 

1 

— 

_ 

- "06 „ 

-•70 

21 

2 

— 

— 

_ 

- -70 „ 

-•75 

21 

5-5 

—- 

— 

_ 

-•75 „ 

-•80 

16 

6-5 

_ 

_ 


-•80 „ 

-•85 

17 

3 

— 

— 

_ 

-•86 „ 

-•90 

27 

1 

— 

— 

_ 

- •90 „ 

-•95 

9 

1 

1 

_ 

_ 

-•96 „ 

-1-00 

12 

2 

— 

— 

— 

Total 

1000 

1000 

500 

1000 

500 


H 6 refers to samples of 5 from rectangular population R. 

Ti and T\q refer to samples of 5 and of 10 respectively*from triangular population T. 
and Nio refer to samples of 5 and of 10 respectively from normal population N. 



Paul R. Rider 


385 

Pearson curves were fitted to three of these distributions with the following 
results: for 1000 samples of 5 from the rectangular population, a Type II curve* 


y -1000 x 0-60632 (1 - ...(1); 

for 1000 samples of 6 from the triangular population, a Type I curve 

y* 1000 x 0-11167 (1-64616 + r)**"® (0-92826 — r) -0 ' 1 *®*.(2); 

for 600 samples of 10 from the triangular population, a Type IV curve 

( „S \— 5*80145 

g—7-4M*tan-J (»/MN7) .( 3 ). 

in which = r — 0-9653. 


In cases (2) and (3) the fit was effected by equating the first four moments of the 
curve to the correy~pnding moments of the observations (without correcting for 
abruptness). In fitting the Type II curve to the distribution of r in 1000 samples of 
5 from the rectangular population it was found that 

f = 004685, /x, = 02060225, ^ = -0-0001043661625, 

/a« = 013714900615625, 0, = 0-00005817, ft «1-944985, 

“-(i^D*- 0 ' 9895 ' 

Assuming that the small value of fix justifies a Type II curve, we should get 

[ /r - 0'0468fi\*l° ,#43B8 

l-( - 0 9 8^ - ) J .(1«). 

It seemed more desirable, however, to fit a curve of the type y ~y 0 (l — r 1 )*, deter¬ 
mining k and y 0 so that theoretical and observed values are equal for standard 
deviations and totals. It is readily found that 

k-m-s), y„= J^2*+2)_ or NT(k + 3/2) 

2 V, / 9 2 » +1 [r(^+l)]* vVr(ifc+l) 

N being the total number of values of r, that is, the number of samples. Equation (1) 
was derived in this fashion. 

Although the exact distribution of r for samples of size n from a normal popu¬ 
lation has been given J, the general distribution equation is somewhat complicated, 
except in the case when p = 0. In the latter case for the distribution of 1000 samples 
of 5 it becomes 

y = 1000. - (1 - r*)* = 636-62 (1 - r»)*.(4), 

7T 

an equation which should be compared with (1). 

A graphical comparison of results is made in Figures 1—6. The graphs of the 
theoretical distributions of r for samples from continuous normal populations were 
plotted from tables of ordinates of the frequency curves of the correlation coefficient^ 
* See next paragraph. 

t For notation see Elderton: Frequency Curves and Correlation . 

t B. A. Fisher: “ Frequency distribution of the values of the correlation coefficient in samples from 
an indefinitely large population.” Biomelrika, Vol. x (1914—15), pp. 507—521. 
g Biomelrika , Vol. xz (1915—17), pp. 379 ff. 








386 Distribution of the Correlation Coefficient in Small Samples 

In tiie case of samples of 10 from a normal population it will be recalled that 
the value of p, if Sheppard’s correction be not applied, is 0'83. The value of p in the 
corresponding continuous population is 0‘9. The theoretical distribution curves for 
both p = 0'8 and p =• 0'9 are plotted. The histogram representing the actual samples 

Distribution or r in iooo samples of 5 . 

q*o.j [Correlation in sampled population] 

The histogram represents the observed distribu¬ 
tion in samples from a triangular population. 

The solid curve is a Pearson curve fitted 
to the observations. 

The dashed curve is the theoretical distribution for 



Fig. 1. 



Frequency 


Paul R. Bibbs, 


887 


seems to He closer to the curve corresponding to pm 0*8, and it appears that an even 
better fit would be obtained if a curve for p — 0‘&3 were interpolated between the 
two given curves. 

DBTBIBUnON Of P IN 300 SAMPLES 0T tO. 


<0.3 [Correlation In sampled • population.] 

histogram represents the observed distribu¬ 
tion In samples from a triangular population. 

The solid curve is the graph of 
y • soo x af £75 (l+rj 743 ®^ (i-r) 1S7771 

The dashed curve is the theoretical distribution for 





Fbetqiocy 


888 Distribution of the Correlation Coefficient in Small Samples 

It will be noted that the Pearson curves fitted to the values of r from the 
samples drawn from the triangular population fail to have the proper range. It was 
consequently thought desirable to fit a curve of the type y = y 0 (l + r)*i(l — r)V It 


Distribution of r in 500 Samples of to. 

p. 0.5 [Correlation in sampled population.] 

The histogram. represents the observed distribu¬ 
tion in samples from a triangular population. 

The solid curve is a Pearson curve fitted 
to the observations. 

The dashed curve is the theoretical distribution for 



Value of V 


Fig. 8. 



Paul R, Rider 389 

is found that if the fitting is done by the method of moments the constants have 
the following values: 

, (1 +f) , (l — r) — (»* + 3) «r r * 

* - 2 ^- 

, (1-r)*(l+r) + (r-3)<r r * 

-. 

mik+k+i) 

j0 2»i + »i +1 r (k t + 1) r (h + 1 ) ’ 

DISTRIBUTION OF P IN 1000 5AMP1X5 OF 3 

4 i 

Q-o. [Correlation in sampled population] 

The histogram represents the observed distribu¬ 
tion in samples from a rectangular population. 

The solid curve is a Pearson curve fitted to 
the observations. 

The dashed curve is the theoretical distribufton for 
samples from a continuous normal population. 



VAtut or r 

K*. 4. 

Biometrikft xxit 26 



390 Distribution of the Correlation Coefficient in Small Samples 

in which N is the number of samples. For the samples of 10 the equation of the 
curve is 

y * 500 x 0*27275 (1 + r)’^(l-r) 1 * 7 ™ ..(5). 

The graph is shown in Figure 1. 


TABLE III. 

Goodness of Fit. Samples of 5 from Rectangular Population . (p ■» 0.) 


r 

Observed 

frequency 

/o 

Fitted 

Pearson 

curve, 

“Normal 
theory” 
frequency, / a 

a 

/. 

•95 to 


9 


6-2 

•0000 

•7896 

•90 „ 

*95 

10 

14*8 

11*9 

1*5568 

*3034 

•85 „ 

•90 

17 

17*4 

15-4 

•0092 

•1662 

•80 „ 

*85 

20 

19*6 

18*0 

•0082 

•2222 

•75 „ 

•80 

19 

21*4 

201 

•2692 

0602 

•70 „ 

*75 

22 

22-8 

21-9 

•0106 

•0005 

•66 „ 

•70 

35 

24*0 

23*4 

5-0417 

5*7504 


•65 

21 

26* 1 

24*9 

•6697 

*6108 

•55 „ 

*60 

33 

26-0 

26*0 

1*8846 

1-8846 

•50 „ 

•55 

23 

26-8 

27*1 

•5388 

*6203 

•45 „ 

•50 

23 

27*5- 

28-0 

•7364 

-8929 

•40 „ 

•46 

27 

28*1 

28-7 

•0431 

•1007 

•35 „ 

•40 

33 

28*6 

29*6 

•6769 

•3905 

•30 „ 

•35 

30 

29-0 

30*1 

•0345 

•0003 

*25 „ 

•30 

27 

29-4 

30-5- 

•1959 

•4016 

•20 „ 

•25 


29*7 

31-1 

•4609 

•8363 

•15 „ 

•20 

33 


31*3 

•3000 

0923 

*10 ,, 

•15 

32 

30-1 

31*6 

•1199 

•0051 

•05 „ 

•10 

33 

30*2 


•2596 

•0531 

•00 „ 

•05 

24 

30-3 

31*8 

1-3099 

1*9132 

*00 „ 

-•05 

33 

30*3 

31*8 

•2406 

•0453 

-•06 „ 

-*10 

29 

30-2 

31-7 

•0477 

*2230 

-•10 „ 

-•15 

32 

30-1 

31*6 

•1199 

*0051 

-•15 „ 


28 

30-0 

31*3 

•1333 

•3479 

- 20 „ 

-•25 

29 

29-7 

31*1 

•0165 

•1418 

— *25 „ 

-•30 

28 

29-4 

30*5~ 

•0667 

•2049 

— '30 ,, 

- *35 

24 

29-0 

30*1 

•8621 

1*2362 

- *35 „ 

-•40 

28 

28*6 

29*6 

0126 

*0865 

- •40 „ 

-•45 

27 

28-1 

28-7 

•0431 

*1007 

— •45 „ 

§f9£!M 

41 

27‘5- 

28*0 

6-6273 

6*0357 

- -60 „ 

— *55 

24 

26*8 

27*1 

•2925 

‘3546 

-•66 „ 

-•60 

28 


26-0 

•1638 

•1538 

-•60 „ 

-•65 

29 

25-1 

24-9 

•6060 

*6751 

- •66 „ 

-•70 

21 

24*0 

23-4 

•3760 

*2462 

-•70 „ 

-•75 

21 

22*8 

21-9 

•1421 

*0370 

— 75 „ 

-*80 

16 

21-4 

20*1 

1-3626 

•8363 

— *80 „ 

-•86 

17 

19*6 

18*0 

■3449 

•0566 

— *86 „ 

-•90 

27 

17-4 


6-2966 

8-7377 

- *90 „ 

-•95 

9 

14-8 

11*9 

2-2730 

•7067 

-•95 „ 

-1-00 

12 

9*0 

6-7 


4*1925 

Totals 

i 

1000 

999-6 

999*6 

34*1421 

39*6168 


For fitted Pearson curve, x*«34T421, w»38, 0*648. 

For “normal theory” frequency, 39*5168, n-39, P-0‘448. 























FKLQ.UENCY 


Paul R. Ridbr 


301 


DiSTmmoN of r in tooo Samples of 5 

The histogram represents the observed distribu¬ 
tion in samples from normal correlation table N; 
p-o.es, without Sheppard’s correction. 

9-0.901, with Sheppard's correction. 

The solid curve is the theoretical distribution 
for samples from a continuous normal population 
having 9 - 0.6 

The dMhed curve is the theoretical distribution 



Vm.u£ of r. 

Fig. 5. 


35—3 


Fbequcncy 



392 Distribution of the Correlation Coefficient in Small Samples 

Distribution of r in joo Samples of 10 

The histogram represents the observed distribu¬ 
tion in samples from normal correlation table N; 
g-o.es, without Sheppard’s correction. 

9-0.901, with Sheppard’s correction. 

The solid curve is the theoretical distribution 
for samples from a continuous normal population 
having q - 0.8 

The dashed curve is the theoretical distribution 
for a continuous normal population having Q-0.9 



Fig. 6. 


Frequency 



Paul R. Rider 


393 


The corresponding equation for samples of 5 was not worked out, but it was 
noted that the value of k% was negative, yielding a curve very much like the solid 
curve of Figure 2, starting, however, at r « — 1 and having an asymptote at r ** 1. 

Goodness of Fit. 

The computation, especially the mechanical quadrature, necessary to apply the 
X* goodness of fit test to all of the distributions did not seem warranted. However, 
the test was applied in several instances. 

The application to the distribution of r in the samples of 5 from the rectangular 
population (the case represented in Figure 3) is shown in Table III. The fitted 
Pearson curve is giver by (1); the “normal theory” frequencies corresponding to 
samples of 5 from a normal population in which the correlation p is zero were 
obtained from Fishers curve (4). For the Pearson curve, x 1 “34*1421,, and since 
there are 40 groups and the theoretical distribution has been made to agree with 
the observed in the total and in the standard deviation n = 38, using the notation 
n + 1 wmn' of Elderton’s Table (Table XII of Pearson’s Tables for Statisticians and 
Biometricians, Part I). These values are beyond the range of this Table, but by 
means of Tables of the Incomplete T function it is found that P»0*648. (R. A. Fisher’s 

TABLE IV. 

Goodness of Fit Samples of 5 from Triangular Population . (p == 0*5.) 


r 

Observed 

frequency 

/o 

“Normal 
theory *• 
frequency, / 

(/.-/)* 

/ 

*9 to 

1-0 

107 

98-9 

•66 

•8 » 

•9 

111 

131-8 

3*28 

•7 » 

•8 

123 

124-2 

•01 

•8 

•7 

111 

109-0 

•04 

•5 ,, 

•6 

101 

93-2 

•65 

•4 „ 

•5 

103 

78-8 

7-43 

*3 >» 

•4 

63 

66-3 

•16 

•2 „ 

•3 

67 

55-7 

2*29 

•1 • 

•2 

45 

46-7 

•06 

0 „ 

•1 

36*5 

39-2 

•19 

o„ 

-*1 

32*5 

32-8 

•00 

-•1 „ 

— *2 

13 

27-4 

5-54 

-’2 „ 

-•3 

21 

22-9 

•16 

• -"3 „ 

- *4 

16 

19-0 

•47 

-•4 „ 

- *5 

14 

15-6 

•16 

-•5 „ 

-•6 

11 

12-7 

•23 

-•« „ 

— •7 

6 

10-1 

1-66 

-•7 „ 

— *8 

12 

7-7 

2*40 

-•« » 

— *9 
-1-0 

{5 

J5-4 

\2-6 

•12 

Totals 

1000-0 

1000-0 

25-51 


X s «26*51, »-18, P-0112. 










394 Distribution of the Correlation Coefficient in Small Samples 

approximate method* yields the value P* 0*654.) For the Fisher curve for samples 
of 5 from uncorrelated normal material, x* ~ 39*5168, n *» 39 (since there are 40 classes 
and the theoretical distribution is made to agree only in the total), P —0*448. 
(Fishers approximate method gives the value P* 0*454.) 

The goodness of fit test for samples of 5 from the triangular population is shown 
in Table IV. The “ normal theory ” frequencies are those corresponding to samples 
of 5 from a normal population in which p = 0*5 f. 

Here 25*51, n — 18 (n' = 19 for use in Pearson’s Tables for Statisticians and 
Biometricians), and P = 0*112. 

For samples of 10 from the triangular population (see Table V) the value of x* 
is 46*89, w = 22, and P« 0*00153, the only extremely bad fit noted. 

TABLE V. 

Goodness of Fit Samples of 10 from Triangular Population, (p = 0*5.) 




Observed 

“Normal 

(/.-/)• 

r 


frequency 

theory” 



/. 

frequency, f 

/ 

*95 to 

1*00 

J 3 

J 

ro-7 

8-47 

•90 „ 

•95 


■ 

14-6 

•86 „ 

•90 

11 

12-9 

*28 

•80 „ 

•86 

17 

22-4 

1-30 

•76 „ 

•80 

28 

30-6 

•20 

•70 „ 

•75 

27 

36-8 

2*61 

•06 „ 

•70 

40 

40-1 


•60 „ 

•65 

46 

411 

•58 

•65 „ 

•60 

54 

40*2 

4*74 

•50 „ 

•55 

31 

38*0 

1-29 

•46 „ 

•50 

36 

34*9 

•03 

•40 „ 

•45 

36 

31-6 


•35 „ 

•40 

38 

27-9 

3-66 

•30 „ 

*35 

19 

24-3 

116 

*26 „ 

•30 

22 

20-9 

*06 

•20 „ 

•25 

22 

17*7 

1*04 . 

•16 „ 

•20 

15 

14-9 


•io „ 

•15 

22 

12-4 

7*43 

•05 „ 

•10 

6 

10*2 

1*73 

•00 „ 

•05 

2*5 


8-4 

4*14 

•oo „ 

-•05 

4*5 


6*8 

•78 

-•05 „ 
- -io „ 

-•10 

-•15 

{! 

1 

f6-4 

L4-3 

4*63 

-•15 „ 
-•20 „ 

- *20 
-•25 

{3 

\ 

f3*4 

(2-6 

1:50 

- -25 „ 

-•30 

/ 3 


(2*0 


-•30 „ 

-•35 


1*5 

*62 

Below 

-•35 

u 


[3*6 


Totals 

500*0 

800-0 

46*89 


X«»46*89, n=22, />=0*00153. 


* Pnt t =s/2x* - s/%n- 1; then P — j^ ^ dt approximately. 

t See Biometrika , Vol. xi, p. 881. The frequencies were obtained from the ordinates by quadrature. 









Paul R, Ridee 


805 


The distribution of r for samples of 10 from the normal correlation table (Table I) 
in whioh p •» 0 - 83 before Sheppard’s correction is applied (p => 0-9 after) was tested 
for goodness of fit after making Fisher’s transformation* *«■ tanh^r, f => tanh -1 p. 
The variate * is then approximately normally distributed with mean 


and variance 


W+ 


P 

2(n -1) 



1 + P 1 
8(n — 1) 



( 6 ) 



In the present instance, it is found that 5 *= 1*23533, a? =013588. The transforma¬ 
tion and the x* test are worked out in Table VI. It is found that P ■» 0*024. The 
fit is not very good, If. I the main discrepancies are somewhat irregularly scattered 
throughout the distribution. 


TABLE VI. 


Goodness of Fit. Samples of 10 from Normal Population having High Correlation. 

Fisher’s Transformation. 


r 

t = tanh" 1 r 

Z ~2 

Normal 

area 

i(l + *) 

A 


Observed 

frequency 

/» 

(/.-/>* 

/ 


00 

1 -831 7808 
1-472 2195 

1 *256 1528 
1*098 6123 
•972 9551 
•867 3006 
•775 2987 
•693 1472 
•618 3813 
•549 3061 
•484 7003 
•423 6489 
•365 4438 
•309 5196 
•255 4128 
•202 7326 
•151 1404 
•100 3353 
•050 0417 
•0000000 
-•050 0417 

00 

+ 1-6182 
+ -6427 
+ -0565 

- *3709 

- -7118 

- *9985 

- 1*2481 
- 1 -4709 
-1-6738 
-1-8612 

- 2*0365 
-2*2021 
-2*3600 
-2-5117 
-2-6586 
-2-8014 

- 2*9414 
-3-0792 
-3*2157 
-3*3515 

1-00000 

*94719 

■73978 

•52253 

*64464 

*76171 

*84098 

*89400 

•92934 

*95291 

•96864 

*97914 

•98617 

•99086 

-99399 

•99608 

•99745 

•99837 

•99897 

•99935 

•99960 



34 

97 

92 

86 

46 

66 

23 

21 

16 

15 

4 

{? 

2 

«0)U 

219 

•43 

2*54 

•07 

2*67 

6-79 

•46 

*77 

•87 

6-38 

•27 

•00 

•03 






500-00 

500 

23-47 


i-1-2351, 0-3686, x‘-23-47, *-18, P-0QU. 


* See B. A. Fisher, “ On the 'probable error* of a Coefficient of Correlation deduoed from a email 
Sample/* Yol. i. No. 4 (Sept. 1,1921), pp. 8—82. E. S. Pearson hae given an illnetration showing 

the degree of aoonraoy of this approximation in a very similar ease (»=10, p= *8) ( Biometrika , Yol. xzz, 
p.s»); 

t Koto that we have a positive and a negative value of (« -?)/<r,. 


















396 Distribution of the Correlation Coefficient in Small Samples 

Comparison of Constants . 

In Table VII the means, the standard deviations, and the betas of the distribu¬ 
tions of r for the observed samples from the various populations are compared with 
the respective constants* for the corresponding theoretical distributions for samples 
from normal populations. 

TABLE VII. 


Comparison of Constants for Distribution of Correlation Coefficient r . 


Sampling Experiment 


? 

o r 

A 

A 

1000 samples of 5 from 
rectangular population 
(p- 0-0) 

Experiment 

Normal theory 
Standard Error 

-0*0031 

0 

0*0158 

0*5153 

0*5000 

0*0079 

0 0000 6817 

(VA = 0-0076) 

0 

(•V*- 0-0447 ) 

1*9460- 

2 

0*0447 

1000 samples of 5 from 
triangular population 
(p = 0*5) 

Experiment 
Normal theory 
Standard Error 

0*4653 

0*4517 

0*0134 

0-4127 

0-4239 

0-0104 

1*3455- 

1*0315 

0*1312 

3*9442 

3*4191 

0*2096 

500 samples of 10 from 
triangular population 
(p-0‘5) 

Experiment 
Normal theory 
Standard Error 

0*4910 

0*4787 

0*0119 

0*2483 

0*2671 

0*0098 

0*8882 

0*7431 

0*2085+ 

5*4426 

3*6774 

0*4737 

1000 samples of 5 from 
normal population 
(/) = 0*83 for discrete 
frequencies and p — 0*9 
for continuous) 

Experiment 
Normal theory 

pss0 m 80 

Standard Error 
„ p=0*83 

„ p = 0*90 
Standard Error 

0*7852 

0*7541 

0*0085+ 

0*7873 

0*8687 

Q*0055+ 

0*2306 

0*2691 

00126 

0*2461 

0*1748 

0*0127 

5*0457 

5*4065 

13*0290 

9*2953 

9*7830 

21*7679 


Experiment 
Normal theory 

0*8012 

0*1433 

3*5357 

9*4080 

500 samples of 10 from 

p = 0‘80 

0*7819 

0*1461 

3*1377 

8*0634 

normal population 

Standard Error 

0*0065+ 

0*0087 

_ 

_ 

(as for samples of 5) 

„ p~ 0*83 

0*8135+ 

0*1288 

3*72 

9*2 


„ p = 0’90 

0*8887 

0*0832 

5*7475 

13*6667 


Standard Error 

0*0037 

0*0066 

— 

— 


The standard errors shown are all approximate. They were obtained as follows: 
The standard errors of fa and fa were taken from Tables XXXVII and XXXVIII 
of Karl Pearson's Tables for Statisticians and Biometricians , Part I. In the case in 
which fasnO the approximate formulaf 

'V£ = (ft-6& + 9)*/iVi 

# The values corresponding to />=0*9 and p= 0*8 were found directly in Biometrika , Vol. xi (1915— 
17), pp. 899 ff. The values corresponding to ps=0*83 were obtained by interpolation. 

f See E. S. Pearson, “The Test of Significance for the Correlation Coefficient,” Journal of the 
American Statistical Association , Vol. xxvi (1931), p. 18. 









Paul R. Rider 


397 


for the standard error of 0i was used. For the other constants the formulae* 

a *~7N' <r<r=3E 2 v n ) 

were employed. Here N is the number of samples, which in the present cases is 
either 1000 or 500. 

In most instances the deviations of the constants from their expected values in 
samples from normal populations are not improbable. The worst discrepancy seems 
to be that of 0% in the 500 samples of 10 from the triangular population. For the 
samples from the normal population having high correlation the constants seem to 
be in substantial agreement with the theoretical values for samples from a continuous 
normal population h4.*ing p = 0 83, which is the value obtained from the sampled 
correlation table without Sheppard's correction for grouping. 


II. The Effect of the Coarseness of Grouping. 

Description of the Populations sampled . 

To study the effect of the coarseness of grouping upon the distribution of r two 
correlation tables were constructed from tables of volumes of the normal surface f 
for p = 0’5. These are shown in Tables VIII and IX and are designated Populations A 
and B respectively. 

TABLE VIII. 


Population A. 


Totals 


Totals 



p 5= 0*4602, without Sheppard's correction. 
p — 0 49008, with Sheppard’s correction. 


Population A . This is a 7 x 7 cell table. The class interval is O’SOo*, or 0*82<r 
if Sheppards correction has been applied. The value of p as actually calculated from 
the table is 0*4602. With Sheppard’s correction (applied to the two marginal standard 

# See footnote f on p. 896. 

t See Tablet for Statittieiant and Biometrieiant , Part II, Table VIII, pp. 78—136. 






398 Distribution of the Correlation Coefficient in Small Samples 

deviations involved in p), /> = 0*49008, essentially the same as in the continuous 
frequency surface. 


TABLE IX. 


Population B . 

Totals 


0 

9 

90 

170 

68 

337 

9 

227 

1028 

940 

170 

2374 

90 

1028 

2273 

1028 

90 

4509 

170 

940 

1028 

227 

9 

2374 

68 

170 

90 

9 

0 

337 

337 

2374 

4509 • 

2374 

337 

9931 


p- 0-4377, without Sheppard’s correction. 

^=0*4924, with Sheppard’s correction. 

Population B. This is a 5 x 5 cell table. The class interval is Tiber, or T28<r 
if Sheppard's correction has been applied. The value of p calculated from the table 
is 0*4377. With Sheppard’s correction, p = 0*4924. 


Results of Sampling . Comparison of Constants. 

Five hundred samples of 5 pairs each were drawn (by Tippett’s numbers) from 
Population A and also from Population J5*. These were combined to form 250 
samples of 10 from each of the populations. 

In the case of samples of 10 from Population A t the values of r were calculated 
both without and with Sheppard’s correction. In the rather rare instances in which 
this correction led to a value of r greater than unity the value r = 1 was used. 

The resulting distributions of r are shown in Table X and are compared with 
the corresponding theoretical distributions from a continuous population in Tables XI 
and XII through the values of the means, the standard deviations, and the two 
betas. There is some indication, supported by the result of the goodness of fit test 
provided in Tables XIII and XIV below, that the distribution of r for samples 
from a grouped normal population tends to be the same as that for samples from 

* Twenty indeterminate forms occurred in calculating r from samples from Population B. By inde¬ 
terminate form is meant a sample in which one of the standard deviations used in calculating r is aero 
(or in which both of them are zero). A typical example that actually occurred is 


B 

Eg 

D 

gs 


-1 

H 

B 




0 


Each sample like this was discarded and replaoed by a new sample. 







Paul R Rider 


309 


a continuous population having p equal to the value obtained from the grouped 
population without applying Sheppard’s correction. This tendency is also marked 
in the samples from the normal population having high correlation. (See Part I.) 

TABLE X. 

Distribution of r in Samples of 5 and of 10 from Populations A and B. 


r 

Frequencies 


n* 

^10 


•95 to 

1-00 

14 

15 

1 

1 

•90 „ 

•95 

26 

33 

1 

4 

•85 „ 

•90 

27 

34 

4 

2 

•80 „ 

•85 

21 

28 

4 

8 

*75 „ 

•80 

33 

31 

18 

15 

•70 „ 

•75 

28 

12 

14 

14 

•65 „ 

•70 

26 

19 

21 

15 

•oo „ 

•65 

38 

41 

17 

20 

•55 „ 

•00 

20 

16 

23 

26 

•60 „ 

•55 

33-6 

27 

18 

20 

•45 „ 

•50 

27*5 

21 

15 

15 

•40 „ 

•45 

27 

30 

14 

17 

•35 „ 

•40 

16 

18 

18 

7 

•30 „ 

•35 

12 

17 

15 

17 

•25 „ 

•30 

13*5 

13 

12 

17 

■20 „ 

•25 

7*5 

18 

11 

7 

15 „ 

•20 

13 

11 

7 

8 

•10 „ 

•15 

11 

10 

7 

4 

•05 „ 

•10 

19 

5 

7 

4 

•oo „ 

•05 

9*5 

19 

7 

5 

•oo „ 

-•05 

10-5 

17 

2 

7*5 

-•05 „ 

-•10 

3 

0 

5 

6*6 

-10 „ 

-•15 

4 

7 

2 

2 

-*15 „ 

-•20 

7 

6 

1 

3 

- -20 „ 

- -25 

8 

9 

3 

2 

- "25 „ 

-•30 

5 

4 

l 

2 

-•30 „ 

-•35 

3 

3 

— 

1 

- 35 „ 

-•40 

4 

3 

2 

1 

•40 „ 

- -45 

6 

12 

— 

2 

-•46 „ 

-•50 

2 

1 

— 

— 

-•60 „ 

-•55 

6 

4 

— 

1 

" ‘66 „ 

-•60 

3 

1 

— 

— 

-•60 „ 

-•65 

2 

6 

— 

— 

- -65 „ 

-•70 

3 

3 

— 

— 

- •70 „ 

-•75 

2 

1 

— 

— 

-•75 „ 

-•80 

3 

3 

— 

— 

-•80 „ 

-•85 

2 

1 

— 

— 

- -86 „ 

-•90 

2 

— 

— 

— 

-•90 „ 

-•95 

2 

— 

— 

— 

- -95 „ 

-1-00 

— 

2 

— 

— 

Totals 

500 

500 

250 

250 


and A 10 refer to samples of 5 and of 10 respectively from Population A. 
B 6 and B 10 refer to samples of 5 and of 10 respectively from Population B. 








400 Distribution of the Correlation Coefficient in Small Samples 


TABLE XI. 

Constants of Distribution of r for 600 Samples of 5. 


Population 

f 

<r 

Si 

ft 

Experimental: 


0-425f> + 



A (7x7 cells), 0-400-2, *p e = 0 49008 

0*42305 

1-1281 

3*5818 

li( fixft cells), p = 0-4377, 0'4924 

0*4224 

0*4279 

0*8340 

3*2480 

Theoretical: 





Continuous distribution, t p = 0’5 

0*4517 

0-4239 

1*0315 

3-4191 

.» .. p — 0 4 

0-3584 

0-4528 

0*5909 

2*8097 

„ „ p =0-400-2 

0-4143 

0*4363 

— 

— 

„ „ p-0-4377 

0*3933 

0-4429 

— 

— 


TABLE XII. 

Constants of Distribution of r for 250 Samples of 10. 


Population 

f 

a 

A 

A 

Experimental: 

A (7 x 7 cells), p«0*4602, *p c = 0*49008 

B (5x5 cells), p = 0*4377, *^ = 0*4924 

A (Sheppard's correction applied to 

each sample) . 

Theoretical: 

Continuous distribution, + />=~0*5 
n v p — 0'4 

„ „ p =0*4602 

„ „ p-0*4377 

0*4440 

0*4431 

0*4780 

0*4787 

0-3813 

0*4398 

0*4179 

0-2662 

0*2794 

0-2855+ 

0-2071 

0-2917 

0-2777 

0-283 

0-4400 

0*5583 

0*5289 

0-7431 

0-4374 

0-606 

0-538 

3-0453 

3-2786 

3-1995+ 

3*6774 

3*1669 

3*45 

3*33 


* p c is the value of the eorrelation coefficient when Sheppard’s correction has been applied to the 
marginal standard deviations of the correlation table. 

f The values of the constants of the continuous distribution are independent of the number of samples, 
although dependent on the number in the sample. 

Fisher’s transformation tanh - " 1 ;', used in Part I, was also used 

here to obtain theoretical frequencies. 

For Population A : tanh~ x 0*4602 = 0*497565“ £ = 0*523562 by formula (6), 

<r, = 0*873324 by formula (7); the goodness of fit test, worked out in Table XIII, 
gives P = 0*9503. For Population B: f^tanh- 1 0*4377 = 0*469382, £ = 0 494101, 
<r, = 0*373515“ P = 0*502. (See Table XIV.) 


Summary and Conclusions. 

Actual samples of 5 and of 10 were drawn from bivariate populations differing 
greatly from the normal. Population correlations p = 0 and p « 0*5 were used. The 
distributions of r were not essentially different from the theoretical distributions in 
samples from a normal population which may be considered in fact as providing 





Paul R Rider 


401 


TABLE XIII. 

Goodness of Fit. Samples of 10 from Population A . Fisher's Transformation. 


r 

r = tanh~ 1 r 

z-z 

Normal 

area 

4(l+«) 

1-00 

00 

00 

1-00000 

•95 

1-831 7808 

3*5042 

*99977 

•90 

1-472 2195 

2*5411 

•99447 

•85 

1-256 1528 

1*9623 

•97616 

•80 

1*098 6123 

1-5404 

•93827 

•75 

•972 9561 

1-2038 

•88570 

•70 

•867 3005# 

•9208 

•82147 

•65 

•775 2987 

*6743 

•74997 

•60 

•693 1472 

•4543 

•67608 

•55 

•618 3813 

•2640 

•60025 

•50 

•649 3061 

•0090 

•52751 

•45 

•484 7003 

- -1041 

•52141 

•40 

•423 6489 

- *2676 

•60560 

•35 

•365 4438 

- *4235+ 

•66403 

•30 

•309 5196 

- *5733 

•71679 

•25 

•256 4128 

- -7183 

•76362 

•20 

•202 7326 

- -8594 

•80483 

•15 

•151 1404 

- *9976 

•84074 

•10 

•100 3353 

-M337 

•87164 

•05 

•050 0417 

- 1-2684 

•89760 

•00 

•000 0000 

-1-4024 

•91962 

— *05 

-•050 0417 

-1-5365- 

•93779 

-•10 

-•100 3353 

-1-6712 

•96264 

-•15 




-•20 




- -25 




-•30 




-•35 




-•40 









•00023 

530 

1932 

3688 

5257 

6423 

7150 

7489 

7483 

7274 

6892* 

6409 

5853 

5276 

4683 

4121 

3591 

3080 

2606 


1817 

•01485 


•04736 


250A=/ 


f *06 
41*32 
U*83 
9*22 

13- 14 
16*06 

17- 88 

18- 72 
18*71 
18-18 
17-23 
16-02 

14- 63 
13-19 
11-71 
10-30 

8*98 

7*70 

6-52 

5-50 

/4*54 

\3-71 


11-84 


250*00 


Observed 

frequency 

/o 


fs 

4 

18 

14 
21 

17 
23 

18 

15 

14 
18 

15 
12 
11 

7 

7 

7 

7 

{» 
(2 
1 
i 3 

U 


250 


(fosH! 

f 


•01 

2-96 

1*80 

•26 

•64 

•16 

•98 

•00 

•29 

•25 

•71 

•26 

•01 

•05 

•44 

•06 

•04 

•41 

•19 


•68 


10-09 


i = -523,662, cr,= -373,324, ^ = 10-09, n~ 19, P«0-9503. 
Note that we have a positive and a nogative value of (z - z)l<r t . 


good first approximations*. If this is true for samples containing so few items as 
5 or 10, it will assuredly hold true for larger samples. 

The values of r in actual samples (a = 5 and n=* 10) from a normal population 
having high correlation seem to follow the theoretical distribution curve. 

* Cf. E.S. Pearson, 41 Some Notes on sampling Tests with two Variables,” Biomctrika, Vol. xxi B (1929), 
pp. 337—360. In commenting on oertaiu of his experiments, Pearson says, 4 4 These two series of results 
are of considerable interest and suggest that the normal bivariate surfaoe can be mutilated and distorted 
to a remarkable degree without affecting the frequency distribution of r in samples as small as 20.” 
(p. 867.) 

See G. A. Baker, “The Significance of the Product-Moment Coefficient of Correlation with special 
reference to the Character of the Marginal Distributions,” Journal of the American Statistical Association , 
Vol. xxv (1930), pp. 387—896. See also E. S. Pearson, “ The Test of Significance for the Correlation 
Coefficient,” Journal of the American Statistical Association , Vol. xxvx (1931), pp. 128—184. Bead E. S. 
Pearson’s comments on Baker’s results. [The conclusion above seems not whoUy consistent with the 
P’s of Tables V and VI. Ed.] 




402 Distribution of the Correlation Coefficient in Small Samples 

TABLE XIV. 

Goodness of Fit. Samples of 10 from Population B. Fisher’s Transformation. 


r 

z = tanhr 1 r 

x - z 

~*7 

Normal 

area 

hi+«) 

A 

250A — / 

Observed 

frequenoy 

/. 

(/.-/)' 

/ 

1*00 
‘95 
*90 
•85 
•80 
•75 
•70 
*65 
•60 
•55 
•50 
•45 
•40 
,•35 
*30 
•25 
•20 
•15 
•10 
•05 
•00 
-•05 
- 10 
-•15 
-•20 
-•25 
-•30 
-•35 
-•40 
-•45 
-•50 

00 

1-831 7808 
1-472 2195 
1*256 1528 
1-098 6123 
•972 9551 
•867 3005 
•775 2987 
•693 1472 
•618 3813 
•549 3061 
•484 7003 
•423 6489 
•365 4438 
*309 5196 
•255 4128 
•202 7326 
•151 1404 
•100 3353 
•050 0417 
•ooooooo 

-•0500417 

- 1003353 

- 151 1404 

- -202 7326 

00 

35813 

2-6187 

2-0402 

1-6184 

1*2820 

•9992 

•7528 

•5329 

•3327 

•1478 

- -0252 

- -1886 

- *3445" 

- -4942 

- -6390 

- *7801 

- -9182 
-1-0542 
-1-1889 
-1-3228 
-1*4568 
-1-5915- 
-1-7275- 
-1-8656 

1-00000 

•99983 

•99590 

•97932 

•94721 

•90008 

•84110 

•77421 

•70298 

•63032 

•55875 

•51005 

•57480 

•63474 

•68934 

•73859 

•78233 

•82074 

•85410 

•88278 

•90708 

•92744 

•94425 

•95796 

•96895 

•00017 

393 

1658 

3211 

4713 

5898 

6689 

7123 

7266 

7157 

6880* 

6475 

5994 

5460 

4925 

4374 

3841 

3336 

2868 

2430 

2036 

1681 

1371 

•01099 

{■03105 

l 

( -04 
\ *98 
(4-14 
8*03 

11- 78 
14-74 

16- 72 

17- 81 

18- 16 
17*89 
17*20 
16-19 
14-98 
13-65 

12- 31 
10*94 

9*60 
8*34 
717 
6*08 
5*09 
f 4*20 
\ 3-43 
U-75 

7*76 

fi 

8 

15 

14 

15 

20 

26 

20 

15 

17 

7 

17 

17 

7 

8 

4 

5 

7-5 

6-5 

{! 

(2 

1 

1 

2 

ll 

•66 

•01 

. -88 
•37 
•18 
•27 
3*38 
•26 
•28 
•04 
4*25 
•82 
1-78 
1-42 
•27 
2*26 
•66 
•33 
*04 

1-10 

•07 






249-98 

250 

1932 


2 -'494,101, 0 -,=-373,515, x 2 -26'22, w=20, P-0'502. 

* Note that we have a positive and a negative value of (t -*)/<r,. 

Samples of 5 and of 10 were taken from coarsely grouped normal correlation tables. 
The value of p in these tables was 0*5 if Sheppard's correction was applied, without 
this correction it was somewhat less. The distribution of r in these samples seems 
to be essentially the same as the theoretical distribution of r in samples from 
a continuous population in which the value of p is that obtained from the correlation 
table without applying Sheppard's correction. This might be taken as indicating 
that Sheppard's correction should be applied to the samplef, although this appears 
only partially to make the distribution the same as that for samples from a con¬ 
tinuous population. It is undoubtedly better to avoid coarse grouping. 

t See B. A. Fisher, Statistical Methods for Research Workers, p. 152. 






Paul E Rider 


403 


BIBLIOGRAPHY. 

Baker, Qeorge A. The Significance of the Product-Moment Coefficient of Correlation with 
special reference to the Character of the Marginal Distributions. Journal of the American 
Statistical Association, Vol. xxv. (1930), pp. 387—396. 

Dunlap, Hilda Frost. An empirical Determination of the Distribution of Means, Standard 
Deviations and Correlation Coefficients drawn from Rectangular Populations. Annals of 
Mathematical Statistics , Vol. n. No. 1 (1931), pp. 66—81. 

Ezekiel, Mordecai. Methods of Correlation Analysis . New York, Wiley (1930). 

- The sampling Variability of linear and curvilinear Regressions. Annals of Mathematical 

Statistics , Vol. I. (1930), pp. 275—333. 

Fisher, R. A. Statistical Methods for Research Workers. Edinburgh and London, Oliver and Boyd, 
1st edition, 1926; 2nd edition, 1928; 3rd edition, 1930. 

- Frequency Distri# lion of the Values of the Correlation Coefficient in samples from an 

indefinitely large population. Biometrika , Vol. x. (1914—15), pp. 507—621. 

Pearson, Egon S. Some Notes on sampling Tests with two Variables. Biometrika , Vol xxi B . 
(1929), pp. 337—360. 

- The Test of Significance for the Correlation Coefficient. Journal of the American Statistical 

Association , Vol. xxvi. (1931), pp. 128—134. 

Soper, H. E. On the Probable Error of the Correlation Coefficient to a second Approximation. 
Biometrika , Vol. IX. (1913), pp. 91—115. 

Soper, H. E.; Young, A. W.; Cave, B. M.; Lee, A.; and Pearson, K. On the Distribution of 
the Correlation Coefficient in small Samples. Appendix II to the papers of “Student” and 
R. A. Fisher. Biometrika , Vol. XI. (1916—17), pp. 328—413. 

“Student.” Probable Error of a Correlation Coefficient. Biometrika, Vol. vi. (1908—09), 
pp. 302—310. 



THE PERCENTAGE LIMITS FOR THE DISTRIBUTION 
OF RANGE IN SAMPLES FROM A NORMAL 
POPULATION. (n<100.) 

By EGON S. PEARSON, D.Sc. 

The Table A given on p. 41G below represents an attempt to summarise in most 
convenient form for practical use the recent work on the distribution of range in 
samples from a normal population*. It deals only with the case of samples of 100 
or less. The accompanying discussion may be divided into three parts: 

(1) The method of computation of Tabic A. (See p. 416.) 

(2) Experimental checks on the adequacy of the approximation involved. 

(3) Illustrations of the use of Table A. 

In addition to the value of mean and standard deviation, of which the former 
has already been completely and the latter partially tabled, the present Table gives 
certain percentage limits, namely the values of the range which will (a) not be 
attained, and ( b ) be exceeded, in 05%, 1%, 5%, and 10% of random samples. 
The unit is throughout the population standard deviation. 

(1) The Method of Computation of Table A. 

It has been assumed that the sampling distribution of range maybe adequately 
represented by Pearson curves with the appropriate moment-coefficients. That this 
assumption is not unreasonable will be seen from the experimental results presented 
below, but since it has had to be made, no very high degree of accuracy is justified 
in calculating the percentage limits from these curves. Nor indeed is a high degree 
of accuracy required for practical purposes. The procedure adopted may be sum¬ 
marised as follows: 

(a) A framework was first obtained by finding the equations of the Type I 
and Type VI curves, using the appropriate frequency constants (set out in Table VIII 
of my paper referred to above), for samples of size 

n - 3, 4, 6, 10, 20, 60, 100. 

The first four of these curves were made to start at the point, range « w « 0, and 
given the correct mean (» w), standard deviation (« <r w ) t and For the curves at 
n — 20, 60 and 100 the start was not fixed and the first foifr theoretical moment 

* L. H. 0. Tippett, Biometrika , Vol. xvn. pp. 864—387. E. S. Pearson, Biometrika , Vol. xvm. 
pp. 178—194. « 1 Student,” Biometrika, Vol. xu. pp. 151—164. Tablet for Statutieiam and Biomtriciant, 
Part II, pp. ex—cxix. 



Egon S. Pearson 


405 


coefficients were used— w t <r wt fa and fa. The curves given by “Student”* were all 
calculated by the second method; since however the distribution of range is abrupt 
at the lower end when n is small, it seemed probable that a better representation 
of the true but unknown sampling distribution would be obtained in these cases 
by making the approximative curve have the correct start. By the time 20 it 
was considered of greater importance to use the correct fa rather than the correct 
start. 

For the case n * 10 the percentage limits were however found both from the 
fixed start and the 4-moment curve, with the following results. 


TABLE I. 

Lower Upper 


Per cent. Limits 

0*5 7. 

i 7. 

5 7o 

io 7, 

10% 

57, 

1% 

0*5 % 

Fixed start Curve 

1-35 

1*48 

1*86 

2 09 

4*13 

4-48 

515 

5-40 

4-moment Curve 

1*32 

1*46 

1*86 

2*10 

4-13 

4*47 

5*16 

6*42 


The figures provide some idea of the order of uncertainty involved in the method 
of approximation used. The addition of a 3rd decimal place in the limits would 
clearly be meaningless, but the retention of the 2nd decimal appears worth whilef. 

(b ) For each of these framework curves the position of the ordinate at the 
lower and upper tails cutting off 0*5%, 1*0%, 5*0%, and 10*0% of the total 
frequency was found by quadrature and backward interpolation. If w v represents 
the range value corresponding to any one of these ordinates, then the quantities 

l p = (w p — w)/a w .(1) 

were calculated. For a given per cent, limit, p, the value of l p will change with n t 
that is to say with the fa and fa or shape of the sampling curve. But the change 
is not very rapid, and it was found possible to interpolate in the framework so as 
to find with the desired accuracy each of the 8 values of l p for 

n = 3, 4, 5, ... 29, 30, 35, 40, ... 95, 1001 

(c) Having calculated the l p * s, it was only necessary to invert the formula (1), 
and obtain w p from 

w p = w ■+■ IpCTtf ..(2). 

The complete set of values of w was given by Tippett, but <r w had only been 
computed for n = 2, 3, 4, 5, 6, 10, 20, 60 and 100. Three additional values were 
therefore computed at n=* 30, 45 and 75 by the same process of cubature as that 
employed by Tippett, with the following result: 

Sample Size n 30 45 75 

Standard Deviation of Range *6927 *6601 *6237 

* Loe . cit. p. 168. 

t The Table on p. 162 of “Student’s’ 1 paper referred to above gives the limits to the 1st decimal 
place only. 

t A graphical method of interpolation was used. A similar process was followed in finding the 1 0 / o 
and 5 °/ 0 limit* for *Jfa and fa; see Biometrika , Vol. xxn. p. 247. 

Biometrika xxiv 


26 





406 Percentage Limits for the Distribution of Range in Samples 

TABLE II. 


Comparison of Observed and Theoretical Frequency Distributions. 



n 

=3 

n 

= 4 


n 

= 10 


n 

= 20 

n 

=60 

Range 













(central 






Range 



Range 





values) 

Obser¬ 

vation 

Theory 

Obser¬ 

vation 

Theory 

Obser¬ 

vation 

Theo# 


Obser¬ 

vation 

Theory 

Obser¬ 

vation 

Theory 







in 



1*91 










and V 


1-3-1 

and V 


1*6' 




•1 

5 

4*0\ 


0*2' 


less; 



less J 






•2 

16 

9*2/ 

1 

1*0 


1*2 

1 

1-3V 

2*0 

1 

1*5 




*3 

12 

15*2 

3 

2*6 


1*3 

l 

2*3 

2*1 

3 

2*6, 




•4 

24 

21*1 

2 

5*1 


1*4 

4 

3*6 J 

2*2 

7 

4*1 

1 



•5 

18 

26*5 

7 

8*3 

1*5 

6 

5*5 

2*3 

8 

6*3 




•6 

32 

31*3 

9 

12*1 

1*6 

6 

8*0 

2*4 

7 

9*1 



•7 

41 

35-4 

16 

16*2 

1*7 

9 

11*0 

2*5 

11 

12*7 



*8 

34 

38*7 

18 

20*6 

1*8 

15 

14*5 

2*6 

14 

16*9 



•9 

38 

41*3 

19 

24*9 

1*9 

17 

18*5 

2*7 

19 

21-7 



1-0 

39 

43*2 

34 

29*0 

2*0 

20 

22*8 

2*8 

29 

27*0 



W 

51 

44*5 

26 

32-8 

2*1 

30 

27*3 

2*9 

28 

32*4 



1-2 

29 

45*1 

37 

36*2 

2*2 

38 

31-8 

3*0 

36 

37*8 



1*3 

41 

45*2 

33 

39*1 

2*3 

32 

36*2 

3*1 

30 

42*9 


} h 


1*4 

39 

44*8 

31 

41*4 

2*4 

41 

40*2 

3*2 

56 

17*4 



1*5 

45 

43*9 

49 

43*2 

2*5 

49 

43*7 

3*3 

46 

51*0 

4 

1*1 


1-6 

54 

42*7 

50 

44*3 

2*6 

50 

46*5 

3*4 

58 

53*7 

l 

1-9 


1*7 

51 

41*2 

47 

44*9 

2*7 

46 

48'C 

3*5 

59 

55*3 

3 

2*8 


1*8 

34 

39*4 

48 

44*9 

2*8 

48 

49*9 

3*6 

59 

55*7 

5 

4-n 

l 

1*9 

37 

37*4 

47 

44*5 

2*9 

49 

50*4 

3*7 

55 

55*1 

5 

5-7. 


2*0 

38 

35*3 

46 

43*6 

3*0 

51 

50*1 

3*8 

59 

53*5 

8 

7*4 

2*1 

32 

33*1 

45 

42*2 

3*1 

48 

49*0 

3*9 

45 

51*0 

15 

9*2 

2*2 

28 

30*9 

43 

40*6 

3*2 

44 

47*3 

4*0 

56 

47*8 

8 

11*1 

2*3 

25 

28*6 

28 

38*7 

3*3 

56 

45-1 

4*1 

36 

44*1 

12 

12*7 

2*4 

24 

26*3 • 

46 

36*5 

3*4 

30 

42*4 

4*2 

44 

40*1 

12 

14*1 

2*5 

32 

24*1 

41 

34*2 

3*5 

44 

39*3 

4*3 

44 

36*0 

11 

15*2 

2*6 

23 

21*9 

33 

31*8 

3*6 

48 

36*1 

4*4 

40 

31*8 

17 

15*8 

2*7 

25 

19*8 

27 

29-3 

3*7 

28 

32*7 

4*5 

29 

27*8 

19 

16*0 

2*8 

14 

17*8 

24 

26*9 

3*8 

28 

29*3 

4*6 

22 

23*9 

17 

16*8 

2*9 

21 

15*9 

26 

24*4 

3*9 

25 

26*0 

4*7 

20 

20*4 

12 

15*2 

3*0 

16 

14*2 

22 

22*0 

4*0 

20 

22*8 

4*8 

13 

17*2 

14 

14*3 

3*1 

16 

12*5 

22 

19*8 

4*1 

15 

19*9 

4‘9 

10 

14*3 

14 

13*2 

3*2 

11 

11*0 

8 

17*6 

4*2 

17 

17*1 

5*0 

11 

11*8 

11 

11*9 

3*3 

10 

9*6 

15 

15*5 

4*3 

13 

14*6 

5*1 

13 

9*7 

11 

10*6 

3*4 

5 

8*3 

19 

13*6 

4*4 

15 

12*3 

6*2 

3 

7*8 

6 

9*2 

3*5 

6 

7*1 

8 

11*9 

4*5 

13 

10*3 

5*3 

6 

6:3 

9 

7*9 

3*6 

3 

6*1 

16 

10*3 

4*6 

10 

8*5 

5*4 

7 

5*01 

L 

5 

6*7 

3*7 

3 

5*2' 

L 

11 

8*8 

4*7 

3 

7*0 

5*5 

6 

3*9. 


11 

6*6 

3*8 

6 

4*4 


9 

7*5 

4*8 

7 

5*7 

5*6 

— 

3*n 


3 

4*61 


3*9 

2 

3*61 

L 

7 

6*3 

4*9 

7 

• 4*6\ 

5*7 

3 

2*4 


4 

3*7 J 

■ 

4-0 

5 

3*0 


8 

5*3 

5*0 

6 

3*7 f 

5*8 

3 

1*8. 


3 

3*01 


4*1 

2 

2*51 


4 

4*41 


5*1 

3 

3*0 

5*9 

2 

l*4l 


2 

2*3 


4*2 

2 

2*0 


1 

3*7 


5*2 

1 

2*3 r 
1*8) 

6*0 

1 

1*1 


4 

1*8. 


4*3 

1 

1'6. 


2 

3*oi 


6*3 

1 

6*1 

— 

0*8 


2 

1*41 


4*4 

3 

1-3] 


3 

2*4 


5*4 

1 

1*4] 

6*2 

1 

0*6 


l 

1*1 


4*5 

2 

1*0 


2 

1*9 


6*5 

1 

Ml 

6*3 

1 

0*5 


_ 

0*8 

«* 

4*61 







5*6 ) 



6*41 







and l 

6 

2*8J 


7 

6*5 J 


and J- 

3 

3*2) 

and v 

— 

l*2j 


1 

2*5, 


more; 





more; 



more; 





Totals 

1000 

1000*0 

1000 

; 

1000*0 

_i._ 


1000 

1000*0 


1000 

1000*0 

250 

250*0 



Boon 8. Pearson 


407 


Prom this framework the intermediate values of <r w shown in Table A were readily 
obtained, with an error which it is believed should not be greater than a unit in 
the 3rd decimal place. Finally the limits, w p , were obtained from equation (2) and 
are given in the Table*. How close these approximations are to the true values 
we cannot at present tell, but the results described in the following section suggest 
that they are not seriously in error. 

(2) Experimental Checks on the Adequacy of the Approximation involved . 

Tippett has given three experimental sampling distributions for n = 5, 10 and 
20, and has fitted theoretical curves to the last two. But he placed little weight on 
the values of /3 t and & usedf; since then improved values have been suggested 
and it is on these tlj^u the present Table A is based. As I had available seven 
further series of samples, use was made of these for a fresh comparison. The series 
consisted of 1000 random samples of sizes n « 3, 4, 5, 7, 10, 15 and 20 drawn from 
a normal population J. The seven series were completely independent, but a further 
series of 250 samples of 60 was obtained by combining together the samples of 15 
in groups of 4. The observed and theoretical frequencies are compared in Table II, 
for n «= 3, 4, 10, 20 and 60. The result of applying the test for goodness of fit is 
shown below in Table III. In calculating ^ small groups were combined at the 
tails of the distributions so that none of the theoretical frequency groups contain 
a frequency of less than 5. The brackets in Table II show the groupings used. 


TABLE III. 


Sample size 

A 

4 

10 

20 

60 

s 

P 

43*00 

39 

•266 

34*99 

40 

*663 

26*28 

38 

•905 

28*10 

35 

*752 

15*93 

23 

*819 


For the cases n ~ 5, 7 and 15, for which the theoretical curves of the framework 
were not available, only the expected and observed frequencies lying outside the 
eight percentage limits given in Table A are shown (Table IV). 

The least satisfactory agreement occurs when n = 3, where the curve appears to 
allow for too few cases at the two extremes, particularly at the lower limit. But 
apart from this there seems little evidence of any systematic differences, and taken 
as a whole the results encourage a reasoned confidence in the use of the percentage 
limits in Table A. 

* For the case n=2, the distribution of w is the half of a normal ourve whose standard deviation 
(if complete) is */%. The limits were found from Sheppard’s Tables. 

t Loe. cit. p. 878. Reasons for modifying Tippett’s values of fa and fa were discussed by the present 
writer in the paper referred to above. 

t The samples of 8 were provided by Dr J. F. Tocher and those of 20 by Professor T. Hojo; the 
remainder were drawn for me by Mr A. E. Stone. I take this opportunity of thanking them all heartily 
for their assistance. In all oases the sampling was carried out with the aid of Tippett’s Random 
Bampling Numbers (Tracts for Computers, No. xv). The group breadth was the population standard 
deviation. 


26—2 




408 Percentage Limits for the Distribution of Range in Samples 

TABLE IV. 


Frequencies outside % Limits (n « 5,7 and 15), 



Lower Limits 

Upper Limits 

0-5 "/ u 


«7, 

10% 

10% 

6% 

i% 

0-5% 

Expected 

5 

10 

50 

100 

100 

50 

10 

5 

r*- 5 

3 

10 

55 

97 

82 

42 

12 

5 

Observed \ n— 7 

6 

12 

45 

102 

88 • 

53 

14 

10 

= 15 

6 

12 

59 

100 

100 

51 

12 

i 

6 


(3) Illustrations of the Use of Table A . 

In applying statistical analysis to test the probability of a given hypothesis, there 
will often be more than one method of procedure and more than one criterion 
which may be used. Thus in testing on a sample or samples some hypothesis con¬ 
cerning variation in the sampled population, we may use among possible criteria 
either the standard deviation or the range. In so far as we know the sampling 
distribution in both cases, either criterion is of equal value in controlling the risk 
of rejecting the hypothesis tested when it is true. But in general when dealing 
with normally distributed variables, tests on the standard deviation will be more 
efficient in preventing the acceptance of a hypothesis which is false, than those 
based upon the range. It must also be remembered that the theory assumes random 
sampling from a homogeneous population, and a single anomalous individual is 
more likely to throw out a result based upon range than one based upon standard 
deviation. In certain tests however the range criteria are at less disadvantage than 
in others, and because of their simplicity in application, if employed with judgment, 
they will often prove to be extremely useful tools. 

Example 1. In order to determine whether a given “lot” of a certain material 
is up to specification, it may be necessary to consider not only the average value 
of some character measured on each article, but also its standard deviation, cr. If 
the lot be large, perhaps composed of several hundred articles, it is a common 
commercial practice to estimate its nature by sampling. Suppose that it is wished 
to fix a rejection limit for the variation permissible in the sample in such a way 
that we are unlikely to reject a lot for which a < a, and unlikely to accept it if 
<r > ka t where a and k (> 1) are to be given some appropriate values depending on 
the quality of the material which we are prepared to accept. Let us define “unlikely” 
as corresponding to a 1 in 100 chance. What size of sample will then be necessary 
to ensure this result if we use (a) the sample range, and ( b ) the sample standard 
deviation, to provide the estimate of variation?* 

* We shall suppose that evidenoe is available that the oharaoter is distributed approximately 
normally. Further that the size of the sample is small oompared to the size of the lot, so that the sample 
may be considered as drawn from an “ infinite population.” 






Egon 8. Pearson 


409 


(а) Using Range . Let l (n t *01) be the lower and l (n, *99) the upper 1 % limit 
obtained from Table A. Our rule will be to reject the lot when the sample range, 
Wy is > wq. To determine Wq we must find n so that 

l (w, *99) x a = wq= 1 (n> *01) x ka .(3). 

The first equality will result in the long run in our rejecting a lot for which 
cr < a at most 1 time in 100; the second, in our accepting a lot for which cr^ka at 
most 1 time in 100. Suppose now that k = 2. An examination of the Table shows 
that for 71 = 40, / (w, *99)//(w, *01) = 6*09/2*97 = 2*05; and for n =45, the ratio is 
6*16/3*09 = 1*99. The desired size of sample is therefore about 44, and w 0 = 6*15a. 

(б) Using Stan# rd Deviation . Let 8 be the sample standard deviation, and 
*o the limiting value. The upper and lower 1 % limits for s may be found from the 
tables of the integral, where n^/cr* and in Elderton’s notation n' — n*. 
Using a similar notation to that of the range problem above, we must make 8q 
satisfy the relation 

x* (n, -99) x = So* = X* ( w - 01) x - ° .(4). 

71 71 

Again taking k= 2 for purposes of illustration, it will be found that for n = 24, 
X 2 («, ’99 )/x* (w, *01) = 4*08 and for n =* 25 the ratio is 3*96. Consequently the con¬ 
dition will be satisfied for a sample of 25, and s 0 z = l*72a a . Method (6) has therefore 
a clear advantage over method (a), for the gain in time in computation following 
the use of range would hardly balance the loss involved in measuring 20 additional 
articles. 

Example 2. The use of range is however of greater value if a sample is divided 
into small groups. Suppose that a sample of N is broken up in a random manner 
into m sub-samples each containing n observations, so that N * mn. Let Wi t ... w m 
be the observed ranges of these sub-samples, and 

w = (wi + wz + ... + w m )/m .(5) 

be their mean value. We know that the expected values of the mean range and 
standard deviation of range in repeated samples of n are 

w = a n a, (Tfo = b n <r .(6), 

where a n and b n are given in the second and third columns of Table A. It follows 
that we may use as an estimate of the population standard deviation, <r, 

era« w/a n .(7), 

and that this estimate will have a staudard error (s.E.) 

s.E. of <72= — (s.E. of w )« ~ .(8). 

vm 

* The 1 °/ 0 limits may be obtained directly without interpolation from the x* tables in R. A. Fisher's 
Statistical Methods for Research Workers ; the n of these tables is the number of degrees of freedom = sample 
size-1. 









410 Percentage Limits for the Distribution of Range in Samples 

Let us compare the reliability of this estimate with that obtained from the 
standard deviation, s , of the whole N=*mn observations*. In this case we know 
that if 8 is the mean and <r t the standard error of 8 in repeated samples, then 

8~c N (r, c 9 = d N c .(9), 

where the values of c# and d N for small samples have been tabled for the case of a 
normal population!, and tend rapidly to 1 and 1/V2 N, respectively, as N increases. 
Thus we may use as an estimate of a 

<ri =* s/c N .(10), 

which will have as its standard error 

.<“>■ 

where t as N-+ oo . 

A comparison of the reliability of these two methods of estimating a is obtained 
from the ratio 

St andard Error of cr 2 _ 1 b n *>/2n~j) n 
Standard Error of a% 6 N a n 0 N 

The following are a few values of 0»i 

IN 5 10 20 30 40 50 100 

t e N 1 148 1-068 1 033 1021 1016 1013 1006 

By taking 0 N » 1 we shall slightly underestimate the reliability of <r 2 (compared 
to (r x), but the correction can be made if desired. A series of values of <f> n « ^2nb n /a n 
is given in Table V. It is seen that the range method of estimation may be used 
to best advantage by breaking up the observations into equal random sub-samples 
of from 6 to 10 individuals. If this is done, the standard error of <r 2 = Mean range/a n 
will be approximately 115 times the standard error of <ri = 

TABLE V. 



# Nothing would be gained when using t by dividing the N observations into groups, 
t Biometrika , Vol. x. p. 529. Tablet/or Statitticians and Biometriciant , Fart II, Table xvii. 
t This is omitting the eorreoting factors e A and 6#. 













Egon 8. Peabson 


411 


The table may also be used to compare the reliabilities of different forms of 
estimate of a made from range. For example, N observations may be used 
(a) as m groups of n, 

b n <r . 

S.E. 7= 

OnV 


Estimate, c% 


w 

On' 


m 


tr 


.( 18 ), 


( b ) as one group, so that m * 1, 

• Estimate, o 2 ** — \ S.E. = — <r ® .(14). 

a* r V2A v 7 

The ratio of the two standard errors of estimate is therefore <f> n /<t>N • For example, 
the advantage of breaking a sample of 100 into 10 samples of 10 is clear, for 
^io/^mo c » ri58/r7Qtf m 0*679, or by using <r 2 rather than <r a " we obtain a reduction 
in standard error of about 32 %. 

There is also another advantage. The sampling distribution of range is asym¬ 
metrical but approaches most nearly to the normal when n lies between 6 and 10*; 
if fa and fa refer to the sampling distribution of w, then the coefficients for the 
sampling distribution of w, the mean range in m samples (and therefore the coefficients 
for <t a » w/a n ), will be B\ = faJm and J5 2 * 3 4- (fa — 3 )/m. Consequently the estimate 
<r a will not only have the lowest standard error when n has a value of 6 to 10, but 
will also be more nearly normally distributed than if n were larger. 

Example 3. Reliability in estimation is closely associated with efficiency in 
discrimination. The following figures represent 40 random variates obtained from 
sampling a normal distribution with mean = 51 and standard deviation = 10 units. 
Could we be sure that they were not drawn from a distribution in which <r = 5 units? 

48, 54, 41, 53, 49; 51, 44, 34, 62, 54; 59, 39, 45, 57, 49; 

44,50,57,37,50; 62,57,51,59,54; 35,49,36,63,46; 

53, 41, 47, 39, 59; 54, 44, 61, 63, 44. 

If the sample is treated as a whole, the lowest and highest variates are found to 
be 34 and 63, giving a range of 29. For a sample of n = 40 from a distribution with 
<r « 5, Table A shows that the upper 1 / limit of range is 6*09 x 5 =* 30 # 45. The 
observed range is slightly less than this, so that the sample would be judged ex¬ 
ceptional but not clearly impossible if a were equal to 5. Suppose now the numbers 
are broken up into 8 consecutive groups of 5 as shown by the semi-colons; the 8 
ranges are now 13, 28, 20, 20, 11, 28, 20, 19. The upper 0*5% limit for samples of 
5 from a distribution with o*« 5 is now 4*85 x5» 2425, and 2 out of the 8 ranges 
exceed this value. This alone suggests that the hypothesis, a = 5, is unlikely, but 
we may obtain more convincing evidence by comparing the mean range for the 
8 samples, w «19*87, with the expected mean and its standard error for a = 5. 

We have j w**a n x <j« 2*3259 x 5 « 1163, 

( = cr w /Vm « b n a/*Jm = *8641 x 5/^8 » 1*53. 

Since (dz-w)/^* 5*5, there can now be little doubt whatever that the standard 
deviation in the sampled population must have been greater than 5 units. 

* Bio'metrika , Vol. xvm. p. 191. Tablet for Statisticians and Biometricians, Part II, p. czvii. 





412 Percentage Limits for the Distribution of Range in Samples 

Example 4. The figures given in Table VI represent the breaking-strength 
under tension in lbs. of small briquettes of cement-mortar*. Mixings of the material 
were carried out on each of 10 different days, and 6 briquettes formed from each 
mixing. The problem is to determine whether these 10 groups (or samples) of 
6 differ significantly either in mean strength or in variability. Other investigations 
have suggested that the variation in strength within a homogeneous sample is near 
enough to normal to justify the use of normal theory tests. 

TABLE VI. 


Breaking Strength of Cement-Mortar Briquettes . 


Group Number 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 



390 

578 

630 

623 

596 

345 

488 

625 

722 

800 

Values of 
breaking-strength 
in lbs. 


380 

500 

540 

532 

528 

322 

550 

625 

727 

798 


445 

360 

470 

520 

470 

560 

547 

600 

540 

562 

312 

358 

530 

568 

610 

616 

690 

700 

760 

724 


375 

530 

460 

600 

590 

375 

420 

610 

705 

720 



350 

450 

470 

594 

530 

310 

530 

600 

L ... , 

700 

726 

Mean . 

383*3 

508*0 

505*0 

582*7 

557*7 

337*0 

514*3 

614*2 

707*3 

753*0 

Variance. 

930*6 

1733*3 

1558*3 

1032*6 

748*6 

588*0 

2372*6 

78*5 

169*2 

1150*3 

Standard Deviation 

30*5 

41*6 

39-5 

32*1 

27*4 

24*2 

48-7 

8-9 

13*0 

33*9 

Range . 

95 

128 

100 

91 

68 

65 

148 

25 

. 

37 

80 


In the first place there are two hypotheses to be tested: Hi that the group 
variances, and H 2 that the group means do not differ significantly. Let us apply to 
the problem in turn statistical tests of increasing refinement. 

Denote by y t} s t and w t the mean, standard deviation and range of the fth group; 
by k the number of groups (= 10); by N the total number of observations (= 60); 
by n t the number in the £th group (= 6); and by y 0 and s 0 the mean and standard 
deviation of the N observations. 

(a) Assume hypothesis Hi to be true and test H%. Crude Method using Range . 
The mean of the 10 values of w is found to be w = 83*7 lbs. But for repeated samples 
of 6 we see from Table A that 2*53441 x a ; consequently we may obtain a 
rough estimate of the assumed common group standard deviation, namely 

cr% m 83-7/2-53441 * 33‘0 lbs.(15). 

The lowest of the ten group-means is the 6th (337*0 lbs.), and the highest is the 
10th (753*0 lbs.); this gives a range of 416*0 lbs. If the means differed only through 
chance fluctuations, they would vary with a standard error of <r/V6, which using the 
estimate (jf»33*0 lbs. becomes 13*5 lbs. The observed range among the ten means 

* I am indebted to Mr B. H. Wilson of the Building Research Station for permission to use these 
data. The oement-mortar used on different days was obtained from different sources so that a difference 
in mean strength was to be expected. 




Egon 8. Pearson 


413 


is 416'0/13'5 = 30*8 times this standard error. It is almost inconceivable that such 
a ratio has occurred through chance, and we may therefore conclude that hypothesis 
H% cannot be true, or that the mean strength differs very significantly from group 
to group*. 

It should be noted that when lack of homogeneity has been established either 
by the range test or otherwise, we may next question whether this is due to the 
presence in the series of one or more anomalous individual (or group) values. For 
this purpose Irwins tables of the probability integral of the distances between the 
1st and 2nd, and between the 2nd and 3rd, individuals in samples from a normal 
population will be of assistance!. 

(b) Assume Hi true , and test H a . Exact method . In this case we avoid the use 
of range and obtain an estimate of a 2 from the group variances *** namely 

<r x a = X (n t s t 2 )/(N — fc) = 1243 43, or 35*26 lbs.(16). 

Actually we may avoid the labour of calculating the separate values of 8 t * by 
using the identity 

2 (»««,*) 2 n,(y t -y 0 f .(17). 

It is found that 

V s * ^•(y.-yo^/W)--»851 .(18) 

and is clearly significant, again showing that hypothesis H 2 is untenable}:. 

(c) To test Hi. Method using Range . We now wish to determine whether 
the variation of breaking-strength within each group of 6 changes from group 
to group more than might be expected through chance. Using the estimate, 
a t = 33*0 lbs., based upon the mean range we may ask whether the ten group-ranges 
given at the bottom of Table VI differ significantly. The position can be studied 
in the upper diagram of Fig. 1, which shows the ten values of range represented 
by black circles, and also the lower and upper percentage limits obtained by 
multiplying by 33*0 the figures taken from the row n = 6 of Table A. The small 
figures above the circles refer to the corresponding group numbers of Table VI. It 
will be seen that 3 ranges out of 10 lie beyond the two 5 °/ 0 limits; the expectation 
is 1 out of 10. 

The standard deviation of range for samples of 6 is *8480 x a ; using <r% as the 
estimate of o-, we find 

<r w = 33 0 x *8480 = 28*0 lbs.(19). 

* This result is of course obvious from a mere inspection of the figures, but the example illustrates 
the method of attack. Table A shows us that the range among ten means should only exoeed 5*40 x standard 
error through chance on 5 occasions in 1000; the experiment gives a factor of 30*8! 

f Biometrika, Vol. xvu. pp. 236—250. Tables for Statisticians and Biometrician », Part II, pp. cv—ox 
and Tables xix and xx. 

t For N=60, k=10 we should expect in samples were U % true, ^ s ~*1526, *0651. (See Woo’s 

Tables, Tables for Statisticians and Biometricians , Part II, p. 17.) 







414 Percentage Limits for the Distribution of Range in Samples 


The observed standard deviation of the 10 ranges is larger than this, namely 
85*7 lbs. 

These results suggest that a more critical analysis is desirable as the variation 
in group range may perhaps be significant. 

* ( d) Improvement on (c). If test ( b) has been used in examining the means, 
the estimate ai = 35*26 lbs. is preferable to = 33*0 lbs., and should be used in (c). 
In the present instance, however, the change will scarcely affect the position. 


(e) To test H\. Method using Variance. If it is decided to calculate the 
individual group variances, s t * } we may form a diagram showing the position of the 
10 values with regard to the percentage limits in exactly the same manner as for 
the range. This is shown in the lower half of Fig. 1. For samples of n from a 
normal distribution, s 2 is distributed according to the law 


1. 


V) t- 


in o 




A 11 

(?) 


H - 3 


65 

• • 


10 4 13 

• • • • 


mr 
2 (T 2 


.( 20 ). 


o V) 


* g 
6 


7 


20 40 60 80 100 120 140 

Scale of Range in Tbs. 


160 


180 


200 


2. Variance 


O 

© r- 




lA 


a* 


6 


8 

9 


6 

5 1 410 

3 2 


7 



• 

• 


• 

• • • • 

• • 


[■■MlI 




400 800 1200 1600 2000 2400 2800 

Scale of Variance in (lbs) 2 

Fig. 1. 


3200 3600 


The limits may be found either from the Tables of the Incomplete Gamma 
Function , or from those of ^ by writing o % jn . A comparison of the two diagrams 

shows the small difference in the relative position of the sample points with regard to 
their scales. Range and variance are in fact highly correlated in small samples, 
and in the present case the analysis of range, which is far the more rapidly carried 
out, is probably as useful for the purpose as the analysis of variance. 

Both tests as described provide a picture which is of value in forming a judgment 
on the situation, but neither can lead to an exact measure of probability. For in 



Boon 8. Pearson 


415 


each case we must substitute for the unknown a an estimate (whether <ri or a%) 
depending on the individual samples. This disadvantage is overcome in the follow¬ 
ing test (/). 

(/) To test Hi . Use of the Criterion of Likelihood . In a paper recently 
published elsewhere*, Dr J. Neyraan and the writer have discussed a test developed 
from the principle of likelihood. It will perhaps be of interest to conclude by stating 
briefly the result which it leads to in the present problem. The criterion suggested 
may be defined as 

L-ii-fnWr*/*,' .(2i), 

where 2 (n t 8 t *)/N ...(22). 

L is therefore the ratio of the weighted geometric mean to the weighted 
arithmetic mean of the group variances, and is independent of the unknown a. 
When the variation is normal, the moments of the sampling distribution of L (if 
hypothesis E\ be true) have been found. In the simple case in which the groups 
contain the same number of individuals, i.e. when w 1 -«n* = ... = n k — N/k~*n t say, 
the pth moment coefficient of L about zero is 

r (^r) 

Pp -* P /n _J ,N-k- \ -< 23) - 

r (~2 - ) r (~2 +P ) 

Reasons are given in the paper referred to for believing that the distribution 
of L may be represented approximately in many cases by a Type I curve of form 

y » y 0 L w *- l (l - L) m *- X .(24) 

with the correct mean and standard deviation. That is to say, m t and m t are to be 
determined from the first two moment coefficients in (23) as follows 

m i = fii (fa' - m*')/W “ Mi'*) | 

«.=(i-- *')/w - W .;. l ) - 

The hypothesis Hi becomes less and less likely as Z«*0, and the chance of 
obtaining L% L 0 may be found by any method giving the values of the Incomplete 
B-Function. 

For the present example, in which ^ = 60, &«10, w*= 6, it is found from (21) 
that 

Z«-702 .(26). 

Further mi ** 21*82 and — 4*54. 

We may now apply R. A. Fisher's ^-transformation to (24), and write 

. (27) > 

ni ■■ 2w 2 * 9*08, n% * 2mi» 43*64 .(28). 

* “On the Problem of k Samples.” Bulletin de VAcadSmie Polonaise dee Sciences et dee Lettres . 
S4rie A. Sciences Math&natiqnes, 1981, pp. 460—481. 










416 Percentage Limits for the Distribution of Range in Samples 


TABLE A. 

Percentage Limits for the Distribution of Range in Samples 
from a Normal Population . 


Size 

of 

Sample 

Mean 

Standard 

Deviation 

Lower Limits 

Upper Limits 

Size 

0’5 °/ 0 

1% 

® % 

10% 

10% 

6% 

i% 

0'5% 

of 

Sample 

2 

1*12838 

*8525 

•01 

*02 

*09 

•18 

2*33 

2*77 

3-64 

3-97 

2 

3 

1-69257 

•8884 

*17 

*22 

•45 

•63 

2*92 

3*34 

4*10 

4-36 

3 

4 

205875 

*8798 

•38 

*47 

*77 

*98 

3*26 

3*65 

4*38 

4*65 

4 

5 

2*32593 

*8641 

*59 

•70 

104 

1*26 

3*49 

3*87 

4*69 

4*85 

5 

6 

2*53441 

•8480 

•78 

•89 

1*26 

1-49 

3*67 

4*04 

4*74 

6*00 

6 

7 

2*70436 

•833 

•95 

1*07 

1-44 

1-68 

3*82 

4*18 

4*87 

6-13 

7 

8 

2-84720 

•820 

1-10 

1-22 

1*60 

1-83 

3*94 

4*29 

4-98 

5-23 

8 

9 

2-97003 

•808 

1-23 

1*36 

1-74 

1-97 

4*04 

4-39 

5*07 

5-32 

9 

10 

3*07751 

*797 

1*35 

1-48 

1*86 

2-09 

4*13 

4-48 

5-15 

6-40 

10 

11 

3-17287 

•787 

1*47 

1-59 

1-97 

2*20 

4-21 

4*55 

5-22 

5-47 

11 

12 

3-25846 

•778 

1-57 

1-69 

2-07 

2*30 

4*28 

4-62 

5-28 

5 S3 

12 

13 

3-33598 

•770 

1*66 

1-78 

2*16 

2*39 

4-35 

4-68 

5-34 

5*59 

13 

14 

3*40676 

•762 

1*74 

1*86 

2-24 

2*47 

4*41 

4*74 

5-39 

5*64 

14 

15 

3-47183 

*755 

1-82 

1*94 

2-32 

2*54 

4-47 

4*79 

5-44 

5-69 

15 

16 

3-53198 

•749 

1*89 

2*01 

2-39 

2-61 

4*52 

4*84 

5*49 

5*73 

16 

17 

3*58788 

•743 

1*95 

2-08 

2*46 

2-67 

4-57 

4*89 

5-53 

5-77 

17 

18 

3-64006 

•738 

2*01 

2*14 

2*52 

2-73 

4*61 

4*93 

5*57 

5*81 

18 

19 

3-68896 

*733 

2*07 

2*20 

2-57 

2*79 

4*65 

4-97 

5*61 

5*85 

19 

20 

3*73495 

•729 

2* 13 

2*25 

2-62 

2*84 

4*69 

5*01 

5-64 

5*89 

20 

21 

3*77834 

•724 

2*18 

2*31 

2*67 

2*89 

4*73 

5*05 

5*68 

5*92 

21 

22 

3*81938 

•720 

2*23 

2*36 

2-72 

2*93 

4-77 

5-08 

5*71 

5-95 

22 

23 

3-85832 

•716 

2*28 

2*40 

2*77 

2*98 

4*80 

5-11 

5-74 

5*98 

23 

24 

3-89535 

*712 

2*33 

2*45 

2*81 

3*02 

4-83 

5*14 

5-76 

6*00 

24 

25 

3-93063 

*709 

2*37 

2-49 

2*85 

3*06 

4-86 

5*17 

5*79 

0-03 

26 

26 

3*96432 

*705 

2*41 

2*53 

2*89 

3*10 

4*89 

5*20 

5-82 

6*05 

26 

27 

3*99654 

•702 

2*45 

2-57 

2*93 

3-13 

4-92 

5*23 

5-84 

6*08 

27 

28 

4-0*2741 

•699 

2*49 

2*61 

2*96 

3*17 

4-95 

5*25 

6-87 

6*10 

28 

29 

4-05704 

*696 

2*52 

2*65 

3*00 

3-20 

4*97 

5-28 

5-89 

6*12 

29 

30 

4-08562 

*693 

2*56 

2-69 

3*04 

3-24 

5-00 

5-30 • 

5-91 

6*15 

30 

35 

4-21322 

•681 

2*72 

2*84 

3*18 

3-38 

5-11 

5*41 

6-01 

6*24 

35 

40 

4-32156 

•670 

2*85 

2-97 

3*31 

3-50 

5*20 

5*50 

6*09 

6*32 

40 

45 

4-41544 

*660 

2*97 

3*09 

3*42 

3-61 

6-28 

5-67 

6-16 

6-39 

45 

50 

4*49815 

•652 

3*07 

3-19 

3*51 

3-70 

5*35 

5-64 

6“* 23 

6*45 

50 

55 

4-57197 

•645 

3-17 

3*28 

3*59 

3*78 

5*42 

5-70 

6*29 

6*51 

55 

60 

4-63856 

*639 

3-25 

3-36 

3*67 

3*86 

5*48 

5*76 

6-34 

6*56 

60 

65 

4-69916 

•633 

3-32 

3*43 

3-74 

3-93 

5-53 

5-81 

6-38 

6*61 

65 

70 

4*75472 

•628 

3*39 

3*50 

3-81 

3-99 

5*58 

5-86 

6*43 

6-65 

70 

75 

4-80598 

•624 

3-45 

3*56 

3*87 

4*05 

5*63 

5*91 

6-47 

6*69 

75 

80 

4-85355 

•619 

3-51 

3*62 

3-92 

4-10 

5*67 

6-95 

6-50 

6*72 

80 

85 

4-89789 

•616 

3*57 

3*67 

3-97 

4-15 

5*71 

5-98 

6-54 

6-76 

85 

90 

4-93940 

•612 

3-62 

3-72 

4-02 

4-19 

5*75 

6-02 

6-57 

6-79 

90 

95 

4-97841 

•608 

3*67 

3*77 

4*06 

4-24 

5*78 

6-05 

6*60 

6-82 

95 

100 

6-01519 

•605 

3*71 

3*81 

4-11 

4*28 

5*81 

6*08 

6*63 

6*85 

100 





Egon S. Pearson 417 

The 5 % and 1 % limits for z may then be found by interpolating in his 
tables* with the following results: 

5 % point z * *371, L = *696| (s> ^ 

1 % point * * 519, Z - *630) . K h 

The observed value, Z»*702, lies close to the 5% point; or differences in 
group variation as large or larger than those observed might be expected to arise 
through chance in about 1 experiment in 20. We can hardly therefore feel 
confident that they are significant, and find no reason to be dissatisfied by the 
cruder picture provided by the range test. 

In conclusion I would like to acknowledge much friendly assistance received 
in preparing this ^pAper. Table A is built essentially on the earlier work of 
Mr L. H. C. Tippett, who has also lent me certain unpublished material which was 
of much value in its construction. “Student” too has given me many helpful 
suggestions, and the work was undertaken in the first instance with the knowledge 
that both Tippett and he had proved the utility of the range criterion in certain fields 
of practical application. 

In addition I am most grateful to Mr M. R. El-Shanawany for a large part of 
the lengthy computation of the framework distributions for Table A; to Dr L. J. 
Comrie for supervising the computation of <r w for the cases n = 45 and 75; and to 
“Mathetes” for the loan of the working sheets on which the theoretical distributions 
contained in “Student’s” 1927 paper on Routine Analysis were based. 


Statistical Methods for Research Workers, pp. 212—215. 




THE CONVERSE OF SPEARMAN’S TWO-FACTOR THEOREM. 

By BURTON H. CAMP, Wesleyan University, Middletown, Conn., U.S.A. 


1. Introduction. There have been several attempts to prove, or to disprove, 
the converse of Spearman's two-factor theorem *. As shown by Irwin, these various 
methods of proof result ultimately in the same expression for the so-called general 
factor. Although the several proofs are necessarily alike in many respects, the 
different authors appear to have different pictures in mind at the background of 
their analytical demonstrations. The same may be said of the proof presented in 
this paper. In addition, I have inserted a certain necessary but hitherto neglectedf 
hypothesis, have investigated the possibility of the use of other than linear functions 
as the basis of the formation of the general factor function, have given a numerical 
example in which this factor is not unique, and have discussed more fully the 
important additional question raised by Piaggio as to whether this factor is 
“almost” unique. Let us begin with the neglected hypothesis. 

2. The Net Correlations. It will first be shown that Spearman's proof does 
not hold in a specific case. The reason it fails is that he has produced a certain 
determinant as a definition of his general factor without proving that there exists 
a frequency distribution for which such a definition would be possible. In presenting 
this example I shall use his notation, to which the reader may wish to refer, unless 
he chooses to read the later sections of this paper first. My notation, and Spearman's, 
will be explained in the sections following. 

Let r a &, r ac , etc. be as in the following table: 



a 

b c 

d 

a 


•8 *5 

•35 

b 

•8 

— *7 

•49 

c 

•5 

•7 — 

*306 

d 

‘35 

•40 -306 

— 


Evidently, this set forms a perfect hierarchy, thus satisfying Spearman's only 
hypothesis. He now defines r an so that 


: . ■— ^^aq^aq > ^aq — ^ak/^kg , 


* 0. Spearman, The Abilities of Man (1927), Macmillan Company, Appendix, pp. ii—vii. E. B. 
Wilson, Proceedings of the National Academy of Sciences (U.S.A), Vol. xiv. (1928), pp. 288—291. 
H. T. H. Piaggio, Nature , January 10,1981, p. 56. J. 0. Irwin, British Journal of Psychology^ Yol. xxii. 
(1982), pp. 859—868. Cf. also J. 0. M. Garnett, same volume, pp. 864—872. 
t Except by Wilson, whose methods are quite different. 



Burton H. Camp 


419 


where q, k and a are choeen arbitrarily from the letters a,b,c, ..., except that the 
same letter must not be chosen twice. In particular, now, let a = b, k = c, q=* a. 
Then 

r /r te r ia _ /C7)C8) 
b " V r w v -5 

which is impossible. The tacit assumption is made, therefore, that 

Toe - r^rta > 0 , 

and we are led to infer that this inequality should be predicated for all permitted 
choices of the subscripts, or, to put it another way, that none of the net corre¬ 
lations rca.t, should be permitted to be negative. In fact, this condition is really 
necessary as will be proved later. On the other hand, I presume that it is true that 
it would almost certdLily be satisfied automatically in the kind of practical problems 
to which the theory is currently applied, and so it may be that it is only from the 
logician’s standpoint that it is necessary to insist on it. 

3. The General Theory. Consider (for definiteness) the aptitudes a.,* of 
#= 1 , 2 , ..., N individuals in a = 1 , 2 , ..., n studies. We may always write 

(«) Ga,0 = C.<7* + «a,0, 

where C , <4 and 8 are functions, to be determined, of their subscripts. (I now 
depart from Spearman’s notation. His /, 17 , 8 and a are the same, respectively, as 
my C, g, 8 } and a x .) The variable a may be supposed, without loss of generality, to 
have a mean equal to zero, and a a equal to unity, for each fixed value of a, as 
ranges from 1 to N. 

( b ) Let the correlation between a* and tty, and the net correlation rij. k be 
positive or zero for every set of mutually different values of i,j and k 

(c) Let the totality of r</s constitute a hierarchy, i.e. let parallel arrays of the 
following determinant be proportional, if the series of Vs which constitutes the 
principal diagonal be omitted from consideration: 

r ll r U r 13 ••• r ln 

r tl r n ris ... r*„ 


r n 1 r n a ... r nn | 

For convenience, we shall also suppose this determinant to have been so arranged 
(as is always possible) that 

The converse of the two-factor theorem now states that it is possible so to 
determine. C, g and 8 that, for the given N individuals, and for all mutually 
different subscripts k , l , 

0 .(i), 

r ng ~0 .(ii), 

and this is what we are now required to show. 





420 


The Converse of Spearman's Two-Factor Theorem 


Proof. There exists by hypothesis an n-way correlation solid, that is, a distri¬ 
bution in n dimensional space, determined by the n aptitudes a « 1, 2, n, 
of the N individuals, and the total correlations in this solid are as indicated in A. 
It is necessary to assume that A is not zero, but this is a trivial assumption, for the 
vanishing of this determinant would mean that the n total regression “ planes 99 of 
this solid would have a “ line ” in common, instead of the general mean point only, 
and in other geometrical ways this distribution would be so peculiar as to be un- 
realisable in practice. We shall now show that there exists also a distribution of 
these N individuals in (n + l)-way space, determined by the given N values of each 
of the n variables a 1% a n , and by the N values to be assigned to a single new 
variable g; thus the mean g is zero, the total correlations are as indicated in the 
determinant R of (iv), and also, for R> condition (c) holds, even when the leading 
element r^==l is included, and thus, finally, for all mutually different values of k and Z, 

•Hi — • • ' QK .(m), 



Kl 

w • 0* — 



r 9l 

Vg2 

r gn 

n a 

n i 

rn ... 

rin 

r * 

7 ’ai 

r 2 2 

r 2n 

•• 



• 

r ng 

r nl 

nis ■ • . 

r, )n 


First, such a set of positive numbers does exist, and none are greater than unity; 
for, put r gg *= 1, and let r gx , r^ and r g3 be determined by the equations 

r\% — r g i r g 2 

.(v). 

r 23 ~ r g2 r g3j 


The solution of (v) is 

/r 12 r is /flirts „ 

* V ra"’ #, 'V r ll ’ r »*“V r„ ’ 


and hence by ( b ) the three numbers r„i, r,, 2 , are positive and not greater than 1. 
Then, if l > 3, let 

1 Mi fit 

r gi = r ii- — “V - -;r- r *s. 

r i g ^ 113 r^ 2 

Since ^ r 12 and ru<r 12 and 1, it follows that 

.(vi). * 

It also follows from these equations that 

r gi “ > eto -> 

v r k i 

for all mutually different values of k and Z, and so, adopting here the notation of 
Piaggio and others, we have 

r i 1 = z ..(vii). 

ri rn 














Burton H. Camp 


421 


All these interrelationships may be represented compactly by the following matrix, 
M , in which parallel arrays are proportional, now without the exception of the 
principal diagonal: 


Jl/: 


1 

r m 



••• r gn 


**H 

rn 

*13 

... r ln 

fig 

rn 

r*gt 

^23 

... r ln 



r na 

r n 1 

**n2 

T*n& 

... 


(viii). 


So far, we have proved that there actually does exist a set of numbers as 
indicated in Jf, and Cat each of them is positive and at most equal to unity, but 
we have not shown that they can all be correlation coefficients. That depends on 
whether there exists an (n + l)-way frequency distribution whose total correlations 
are these numbers, and this is the important thing it now remains for us to 
establish. A simple but rather helpful picture of the method is afforded by a case 
of three variables. Consider a two-way frequency distribution between x and y . 
This may be represented by a set of points on a horizontal (xy) plane. Of course 
r XJt is known. Now suppose we have the problem of constructing a three-way 
distribution in xyz space, having the same r xu and also some pre-assigned r yz and 
r zx . It is clear that we may move the points vertically at pleasure without dis¬ 
turbing x or y or r xy . If we do this in such a way as to produce the proper r yz 
when the points are projected on the yz plane, it may be that we will not thus 
have produced the proper r zx when they are projected on the zx plane. The 
question whether we can do both at once depends on how many points there are 
and how many distinct conditions must be fulfilled; and the most direct way of 
solving the problem would seem to be, therefore, to write out the conditions and 
count them. This is what we now propose to do, but our points exist initially in 
space of n dimensions instead of two. The ( 71 + l)th direction is the direction of g . 
The number of points is N . Each of these is to be moved in the g direction to its 
proper position, and the co-ordinate of that new position is also to be called g. The 
co-ordinates of the N points, as located ultimately, are subject to n conditions only, 
viz. 

Zgdi = fTgVgi, (7 = 1 ,2,..., n) . (ix), 


where 




and from them the function g is to be determined. Since there are n conditions, 
the most natural function of the a’s to assume as the form of g is the linear com¬ 
bination 

y - Axai-f... 4- A n a n .(x), 


but this is by no means necessary. Any other function involving n arbitrary 
constants might do as well, or better, and we shall see later that we may start with 
vastly more general functions. It happens too that this linear function is not quite 
Biometrika xxiv 27 











422 The Converse of Spearman's Two-Factor Theorem 

sufficient by itself, because when inserted in (ix) it would lead to n equations in 
the A’ s which would not have a solution. Due to the presence of a g , these equations 
are not linear in the 4’s. In order to meet this new difficulty, it is necessary to 
examine it more closely. Solve Equations (ix) inserting g = y, 

4 S (Ay.^ + ... + A„a„)di = a g rgi, (i -1,2,..., n), 

iv p 

first as if <r t were a known number, noting that they become 

Ain i + A 2 ru + ... + A n ri n = r gi 

. t .(«)• 


AgXn2 + ••• A n V nn — fgn^g] 

Due to the proportional relationships that exist among the r’s, 

A W-l) 


and the solution is 


where 


(t- 1,2 .n) .(xii), 

fli 1 


1 + 2 - 1 + ... H- 2 1 

Ml* “ 1 fin - 1 

This is not a true solution, because a g is a function of the .d’s, but (xii) is a more 
illuminating form in which (xi) may be written. Together with (xii) we must now 
satisfy also the equation <r g = <r yf where, summing for all individuals: 


2 (Am + ... + A n a n f. 


.(xiia) 


It will be shown that this cannot be done, except in the trivial case where all the 
A' s vanish. By (xiia), 

a y* — (^l 2 + ••• + -d-n 2 ) + 2 («4iil2^12 + ••• + ^w-l^n r n-l,n). 

and by (xii), 


“ d “ + - + 

+ 2 + "• + 1 ))} 

= ° 0 f (l -K 2 ) .(xiii). 

Therefore, we cannot satisfy both (xii), which is the same as (xi), and the condition 
that Cy 8 = <r g 2 f i.e. we cannot let g = y, unless K** 0 . The extreme case where K is 
zero, or near it, is discussed in Section 5. It is the condition for an “ almost ” 
unique g. 

Having moved our points up to the position occupied by the plane 

y =*AiOi + ... 4- A n a n .(xiv), 












Burton H. Camp 


428 


and having found that they satisfied all the requirements, except that the standard 
deviation was too small, the obvious thing to do next would be to move them away 
from this plane so as to secure the correct standard deviation but not disturb what 
has already been accomplished. So we make this plane the regression plane of the 
points in their final position, denoting by t their distances (parallel to g) from it. 
This gives us a good deal of latitude in our choice of t% but we are restricted some¬ 


what. Let g —t + y> where, for each i, 

0 .(xv). 

0 0 

Then the condition (ix) will yield (xi) as before, and 

<rg=<r?+<r y % .(xvi), 


and so we must choorf* t so that both (xv) and (xvi) will be satisfied; then (xii) 
will yield the solution. This may not be the only way in which one may obtain a g 
which will have the necessary requirements, but it is certainly one way. By (xiii), 
condition (xvi) is the same as 

**=<r?+* g '(i-K% 

and hence a t = K<r g .(xvii). 

This choice of a t is arbitrary in so far as <r g is arbitrary, but in order that g may 
be compared with the as it is desirable to have <r g « 1 , and so a t = K , and 

Thus there is a definite restriction on the freedom of t> and, as stated earlier, it 
has to do with the almost uniqueness of g. Postponing that subject, we now com¬ 
plete the proof. It proceeds from here as indicated by Spearman and others, and 
so may be indicated very briefly. It remains only to show that (i) and (ii) are 
satisfied It follows from (iii) that the net correlation, ^.^ = 0 , if k and l are 
different. The total regressions of a* and on g are: a* = r^g and aj = r gi g , and 
it is known from general correlation theory that this net correlation is the simple 
correlation between (a k — r gk g) and (a,x — r g ig). Set these equal to s k and 8 t re¬ 
spectively, and let C a — r g0Li and it follows that we have so chosen the variables of 
(a) § 3 that (i) is satisfied: 

r w = °, k$l 

To obtain (ii), compute the correlation between 8 k and g. Since 

2 (a* - gr gk ) = 0 and a,* = i 2 (a* - gr gt f = 1 - »•„** .(xviii), 

0 IV 0 

r. t g = — /■,*' = ^ 2 (a* - gr gk ) g = - 7 =====- (r g t - r gk ) = 0 , 

N v 1 - r g f 0 VI - r gk * 

as desired. 

4. Necessity. We shall now show the necessity of the condition in ( 6 ) 
namely, that it follows from the two-factor theorem and from this theorem only that 
ry > r^rj k . It is quite well known that it follows from the two-factor theorem in the 
direct (not the converse) form that 


f r ik r jk 

r k * - -HU *, 
r a 


27—a 







424 The Converse of Spearman's Two-Factor Theorem 

where is a correlation coefficient which is positive and at most equal to unity} 
bo Vij^rikrjk. 

5. Interpretation. As stated in Section 3, the values of g may be assigned to 
the N individuals in many ways, at least provided N is greater than n + 1. This 
means that we may choose, for example, for the general factor applicable to the 
first individual a value twice as great or half as great as the one assigned to the 
second individual, at pleasure, and still meet all the conditions of this theorem. 
This sort of a general factor is not, I should think, what psychologists desire in 
order to establish the two-factor theory of mental abilities. If the number of ways in 
which one may assign values of the general factor to the N individuals is very large 
instead of unique, it would seem to me doubtful whether from a psychological 
point of view it would be meaningful to assert the existence of such a “ factor ” at 
all, but that is obviously a question for psychologists. That it is true, however, 
that the general factor is strikingly indeterminate is well established in the 
numerical example of Section 6. Consider the 11th and the 12th individuals 
($ = 11, 12) of that example. For the 11th, the general factor equals 0 85 for one 
choice of £s, —1*77 for another; for the 12th, these figures are reversed. The scale 
(a) of these measurements is unity and the origins are at their means. The dis¬ 
parity is perhaps the more striking because both these individuals have been 
assigned the same scores on all four of the tests. It would not be relevant to assert 
that I had made a peculiar choice of t *s in order to secure these differences, that a 
random choice would probably not have produced them, for the choice of £s is not 
restricted to be random; it is arbitrary. The so-called arbitrary constant of inte¬ 
gration affords a perfect analogy. Given an hierarchy, one can assert the existence 
of a general factor in exactly the same sense as, when given an analytic function, 
one can assert the existence of its anti-derivative or “indefinite” integral; there 
will exist a family of general factor functions, every member of which would lead 
to the given hierarchy. 

Admitting this, Piaggio and Irwin have asserted that nevertheless the general 
factor g is what one might call “ almost ” unique, in the sense that the possible 
fluctuations of g from y are small on the average, at least in certain circumstances. 
I shall discuss this fluctuation in a moment, but let us consider first a possible 
demurrer. If their contention is justified, it means only that, for any choice of t* s, 
the average deviation of g from y is small; each member of the infinite family of 
surfaces of which y is the common regression plane lies on the average close to that 
plane. Not every point of it needs to be close to that plane; whatever individual 
be selected, it is possible to find at least one member of that family for which his g 
will differ greatly (if N be much greater than n) from y. For any given hierarchy, 
then, we may write down a whole family of two-factor patterns, some of which will 
differ widely from each other in the case of any previously selected individual. 
However, if the contention of these authors be justified, it does follow that for each 
member of that family the majority of the individuals must have g *s close to y. We 
do know that our hierarchy must have associated with it a set of y’s which deter- 



Burton H. Camp 


425 


mine one or another member of this family, and this set as a whole lies close to y. 
Therefore, the group behaviour of the N individuals might be expected to be as if 
g were unique. So an investigation of the fluctuation of g is warranted. 

We have seen that, if a g = a a «= 1, then <rt~ K , <r y = n/1 — K a , g * 1 + y. It is a 
question whether the approximation, y = y, is a close one, and this depends on 
whether (a t — K) is small, relative to say as small as 01. In terms of 

the r s, 


1 


1 + jv_. + .. 4. 
1 + 1 + 


1 



There are two ways in which 1/K can be as large as 10. One is when the early r’s 
are very close to 1. From the equation, 

a ig — r gl g fi -f sip , 

we get S/ 1 2 — r gl Iga 1 + Si a x . 

ft fi f* 


Hence, using (xviii), 1 = r ffl 2 + >», ei = Vl - r al ®. 

So, if r,,i is dose to 1, i\ ai is close to zero. Moreover, for values of k different from 
1 it follows from the earlier results (§3, i, and ii), that r Hkttl = 0 exactly. Therefore, 
ai has by hypothesis all the properties of the general factor sought, approximately. 
In other words, this is a trivial case where the g of the two-factor theorem has 
already been found, approximately, among the aptitudes measured. 

The other case is where n is so large that, even if each term of l/K 2 is small, 
yet their sum is large. For example, if = 0*5, for all k t l > r g k 2 = 0*5, and 

1 , , 

jT 2 == 1 + 1 + • • ■ + 1 , 

so that 1/K -10 if //= 99. To accomplish the same result when 0*25 would 
require the measurement of 297 aptitudes. Practically, it would appear difficult to 
secure cases of hierarchies among aptitudes as numerous as this with intercorrela¬ 
tions as large, but if they should occur, it would then be true that the group general 
factor would be “ almost ” unique. It is to be remembered that for the success of 
our theory the number of individuals N must exceed the number of tests n*. 


* The relationship between Piaggio’s notation and mine is as follows. His a — m a g + w a s 0 , where 
<r a -l, a Q ~\, <r*-l. My « tt = C a <7 + * tt , where <r n = l, <r g = 1, <r* a - vl -r,* 2 . His g=.k 2 t + hi, where 

s/l-k* 

a t = — ~, <r,= l. 

My 0-V + tt where <r t —J i - A' s , <r t =K. HU fi=my u. His k=my K. Thus his indeterminate part of 
g is ki and mine iB t. Now he says that by increasing his N (my ?/), the coefficient of the uncertainty 
term, k , “can be made as small as we please,” and concludes from that that g may become almost 
unique. This does not seem conclusive, for if his k is less than 1, his k* is still smaller and that is the 
coefficient of the determined term (his t % my y). Irwin rightly judges that it is a matter of standard 
deviations as well as of coefficients, but he says that Ar 3 can be made small at pleasure. This looks like 
a misprint for k s , for, as noted before, it is the smallness of ft 3 which is needed to render g almost 
unique. 

According to Wilson, any set of n aptitudes which do not lead to a hierarchy may be replaced by n 
artificial aptitudes, which are certain combinations of the given aptitudes, which do lead to a hierarchy. 
Since n may be as large as desired, it would follow that there always exists a set of n artificial aptitudes 
for which there is a group general factor which is almost unique. 




426 The Converse of Spearman's Two-Factor Theorem 

This argument for uniqueness has proceeded from the assumption that the form 
of the function g was g = y 4- 1, where y was linear and t had special characteristics, 
but it was remarked that it was not necessary, initially, that y should be linear. 
Indeed, returning to (x), we might have written g** <f> + 1, where 

<j> = Aifi 4-... 4- A n fn , 
and/i was any function of (fy $ such that 

2/<-0 and (i* 1, ..., n). 

p p 

In that case, values of A{ could have been found, usually, subject to the same con¬ 
ditions (ix) as before, and from that point on the original proof would have held 
good. But, nevertheless, in that case, the set of values found for g would still have 
constituted some sort of a frequency distribution in (n*fl) space and this distribu¬ 
tion would have had a regression plane. Also, it is known from general correlation 
theory that the differences between the ordinates to such a plane and the values of 
g thus found would have had the properties previously ascribed to t: 

= 0 , Iciit^O. 

Hence this new g could be represented as the sum of a linear function and such a 
t ; and so it has now been shown that our original solution was in fact the most 
general one possible. 

6 . Example. For the individuals $ = 1 to 40, let the a’s be the observations, 
having the values indicated in the table. It follows that the determinant R (see iv) 
is as below, indicating a hierarchy of r^s and the r^-’s associated with it. The y *s 
are determined from the equations 

S '“7$! < “ , + ° ,) + 6Vf <<, * + ” A 

-di-41. A,-A,. If-40, »-8. 



a i 


a* 

a A 

y 

1—B 

2 c 

2c 

c 

c 


-137 

6—10 

0 

2c 

c 

c 


•69 

11—15 

— c 

~c 

0 

0 

- 4c 

- -46 

16—20 

— c 

- c 

— 2c 

0 

-10c 

—1*14 

21—25 

— c 

— c 

- c 

- 3c 

- 10c 

—1*14 

26—30 

c 

— c 

c 

c 

6c 

•69 

31—35 

c 

c 

c 

c 

8c 

•91 

36-40 

- c 

1 

- c 

— c 

- 8c 

- *91 

2 

0 



0 

0 

•01 

(r 

1 



1 


*91 

c 

2/^/5 



2/V7 

4/35 









Burton H. Camp 


427 


1 

id 

V5 

id 

Ti 

a 

V5 

At 

V7 

2 

1 

4 

4 

4 

V5 

V35 

5 

V35 

2 

4 

1 

4 

4 

V7 

V35 

V85 

7 

2 

4 

4 

1 

4 

V5 

6 

V35 

V35 

2 

4 

4 

4 

1 

V7 

V35 

7 

V35 


The t' s must now be chosen so as to satisfy the relations (xv) and the condition 
that <7*®= 3/35. Two simple choices are : 

Case I: t = V12/7 = 1*81 for ft =* 11, and £ = — 1*31 for ft = 12, t** 0 for all 
other ft's. 

Case II: t * —1*31 for ft — 11, and ^ = 1*31 for ft ~ 12, t = 0 otherwise. Thence 
g*=y + t, and then the s’s are determined so as to satisfy the following equations 
for each ft : 

«i = 2gr/V5 + s t , a t = 2$f/V7 + s t , 

Os = 2gr/V f 5 + sg, a 4 - 2^7 + s 4 . 

Some of the values of g and s± are approximately as in the next table. 


P 

Case I 

Cuse II 






9 


9 

#1 

10 

*G9 

- *62 

*69 

- -62 

il 

•85 

—1*65 

-1*77 

•69 

12 

—177 

•69 

*85 

- 1*65 

13 

- -46 

- *48 

- *46 

- *48 

<r 

1 

•45 

1 

•45 




THE DISTRIBUTION OF THE INDEX IN A NORMAL 
BIVARIATE POPULATION. 

By E. C. FIELLER, BA. 


The Probability Integral of the Index Distribution. 
Consider the distribution of the ratio 


x 


in any bivariate population z =f(x, y) 


where 


' 1-00 +00 

f{x,y)dxdy = 1. 

-00 00 


(U 


Points (x, y) corresponding to a given value of v lie on the line 

y = i>x .(2); 

hence the chance that a random member of the population (1) will have an index 
lying in the range Vi^.v<v% is equal to the volume of the portion of the frequency 
surface (11 that lies above the area swept out in the ay-plane by the line (2) as it 


revolves in the positive direction from the position 

y = i\x .(3) 

to the position y = v t x .(4). 

If we take Vi = - oo, this volume is the chance that the index will not exceed 
v t ; the line (3) is then the y-axis, and the volume is 

fee rv 2 x ro foo 

V i= + f(x,y)dxdy .(5). 

JO J ~ CO J - 00 J v u x 


When the joint distribution of x and y is the normal one, 


z = 




2ir<r x <r y \/i — r® 

(5) gives, for the chance of an index not greater than v, 


■-L h 


e 


1 1 (** <r r zi, vt\ 

2 1 - r* V , 1 <r x <ty + 1 Ty*) dxdy 


l2ir<T lt <Ty Vl -r® 

where a and 6 are the two portions of the ay-plane indicated in Fig. 1. 
The boundaries of a and b are the lines 

X + X ss 0 , 

y + y = v(x + x), 


(6), 

( 7 ), 










E. C. Fieller 


429 






430 Distribution of the Index in a Normal Bivariale Population 


From the last identity we have 

<r* a (l - p 2 )- *«Vy*(1 - i*)/(<r y * - 2 rva^cry + v*<r*)> 

1 - = K “ - rl ); 

1 — p 

squaring the last of these equations, and multiplying by the other two, we get 


p 2 = (ra y — ixr*) 2 /(o-y* — 2 rv<r 9 <T y 4- v 8 ** 2 ).(10), 

whence 1 — p 8 = (1 — r 2 ) <r y 2 /(<r y 2 - 2rvcr x <r y + v a <T x 2 ) .(11), 

<r* ...(12), 

o*, = (o-y 2 - 2 rv<r x a y + t^o-* 2 )* .(13), 

and (T(<r n Vl - p 2 »< 7 * 0 ^ Vl - r 2 .(14). 

Write X= ~ , 

0* o-* 


F = = 


y — vx 


an (o'*, 2 “ 2rw*cry 4- ^ 2 <J* 2 )4 

the quadrants 4 and B of the X F-plane that correspond to the portions a and b of 
the .ry-plane have as common corner the point (—A, — A), where 

x 


h = 


y-vx 


From (8), 


(cry 2 - 2rvcr x a- y + v*<rx)^ 

V= [ [—SfEs^ '-Ir^-wr+V) 

ja+bJ 2va t <r y vl — r 8 


.(15), 

.(16). 


dXdY, 


so that the chance of obtaining an index not less than v is 

0—JIMX <m 

Here A and k are given by (15) and (16); equation (10) provides two values of 
p; we decide which is appropriate by noting that as oo , the point 




so that to make 1 — 0 we must take 

ra v — vcr m 


.(18). 


(<Ty 2 — 2rva x (r y -f t; 2 <j* 2 )* 

Tables are already in existence by means of which the numerical value of C 
may be found*. These tables show over a range of values of p extending from — 1 
to 4* 1, the value of 

l l 


n oo roo roc 

!«««--/ J 


,-ar ^ x '~*' >XY+r '>dXdY 


tk 2tt Vl — p 2 

* Tablet for Statiiticiant and Biometriciant , Part II, Tables VIII and IX. 











E. 0. Fiellbb 


481 


for positive values of h and k ; for evaluating the integral when h and k are not 
both positive, we use the relations 

fJI ■ 

it 

+ 1» [ k z ^ dXdY - 

Thus if the h and 1c provided by (15) and (16) be both positive or both negative, 
we have 

0=1-r +r -l=e-i x 'da; + 2 f°° f" Z(p)dXdY .(17a); 

J\h\ J \k\ v2tt J \h\ J \ k\ 

while if h and k be of opposite sign, 


r oo rao 1 roo foo 

C7= + —La-**■(**- 2 Z(~p)dXdY. 

' ./|*| ^|*lV27T ^|*| '1*1 


.(176) 


Frequency Distribution of the Index . 

By differentiating F with respect to v, we have the frequency distribution of v. 
Since A does not vary with i>, 

dF^aFag^araA 

dv dp dv * dk dv . * ^ 

By a well-known property of the normal surface, 


!{_L_ 

(27t Vi — p* 


i i 
e 21-y 


1 (X»-2 P XY+Y')\ 


I 


0 s 


so that (17) gives 


'0JT0F 

1 


_!T = 2 _ 

3p 2w\/l-p* 


2Wl-^ 

1 1 


.-lih {X '- 2 < ,XY+Y,) 


7T 


Vi — p* 


e"ai-p« 

s 2f^( 


(* 3 - 2p** + *■) 


1 1 / , ^_ 2 rf.I + 


P\ 

V) 


by (15), (16), and (9). 

From (11), -p%- Y’l (1 ^’* Hr VZ’ > - 

v dv («r/ — 2rv<r e o-y + <r,)* 

which, with (11), (18), and (20), gives 

-I * 

( 21 -r* \o* <r, r,*/ 


q-,g-y Vl -r* 


dVdp m _ < 

dp dv ’ir(a* — 2.rvc t <Ty +af)' 


( 20 ) 


(21). 








432 Distribution of the Index in a Normal Bivariate Population 

From (16) we find 

tdk 

(cry* - 2rv<r x <r y + v 2 <r x 2 )i ^ = <r v (rya x - xa y ) + va x (rx<r y - yer a ) .(22), 

and from (17) 


3F_ 1 ff« 

2ir Vl — p* 1 J h 


l i_ 

e"2f-p» 


( x 1 - 2ptk 


+ka) d*-r, 

j -h 


j,. 


--Le-l** (f° e-l-'du-T «-»■'*! 
lit l h-pk ■' h-pk J 


-p* 

Vi-f 


h-pk 

s/I-? 


pi - A 


TT Jo 


•(23). 


Inserting the values of h, k t and p in (23), we have, using (20), (21), and (22), 
the frequency distribution of v : 


V'O)' 


-, “-7-1—i21-r*W <r v V 


7T cr,, 2 — 2rva x tTy + v 2 a x 2 

(y-vS) 9 


+ , " a ^ - 2 rvw«W °J> t t* ~ HZ* 1 jg?) 

7r (<r w 2 — 2rva x a,, + v 2 ff x 2 )^ 

_<r B (np r ,. - Sff t ) + t'a-j ( rxcry - jpr x ) 

x {l 1 - r‘) (V - Sniff, ff, +i>» ff,=) I * g - Ju 2 


.(24). 


We can obtain this distribution in a somewhat more direct manner. If we write 

.(25), 


*-*, »-*, 


= \x\ 


x' \d(ay) 

we have, from (6), the equation to the joint distribution of x and v : 

l r I „ 1 A \( x ,ziy _ o r izf-_v . /*Lr ?Vl 

<£(#, v) =-2 1 - r 2 (\ <r x J o x * v \ a y /j...(26). 

2ira x a y Vl - r 2 

On integrating this equation with respect to x , we arrive at (24). As we should 
expect, (24) is not altered if we increase x, y t <r X) and a v in the same ratio 0 . 


Distribution of the Index in a Curtailed Normal Population . 

The two terms of which the second member of (24) is the sum are essentially 
positive; accordingly, the moments of the index-distribution yfr(v) will be infinite 
with the contributions of the first of these terms. It is obvious that these infinities 
would not arise if we restricted (a, y) to some limited region in the positive quadrant, 
since then the index-distribution would have a limited range. 

[* An erroneous solution of this problem having been sent to me by Mr G. A. Baker, the above 
solution (24) by Mr Fieller was forwarded to him with permission to use it. The result in Equation (24) 
was published by Mr Baker in The Annals of Mathematical Statistics , Vol. iu. p. 5, February, 1932, the 
mention of Mr Fieller being unfortunately overlooked. Ed.] 







E. C. Fieller 


438 


Suppose, for example, that we take the joint distribution of x and y to be 

_ i > M’Lzl' Y_ 2r x — y —i + ( y ~zl \l 

z = z 0 e \\ J *x *» \ *» / J .(27), 


provided that (x, y) lies inside the ellipse 


K=*)‘+ .(28), 

\ / (T x CTy \ <Ty J 

which is a probability contour of the normal surface ( 6 ), and zero if ( x t y) lie outside 
this ellipse. 

Then by applying the transformation (25) and integrating out with respect to x , 
we have, for the ordinate of the curve of distribution of v t 

r4 #) A _L_ 2 r' T ~* V ±A ±( V JLj\\ 

I (X) = Zo \x\e 2 1 - r 2 I \ <r x ) <r x <r v + \ ) j dx .(29), 

where &i(v) and x 2 (v) are the smaller and greater of the roots of 


/x — #\ a _ x — x vx — y /vx — y \ a 
\ <r x ) r <r x <r y \ cr y ) 


Writing 




<Ty ( rya x - x<r v ) + va x (j*xay-^y<r m ) 


x 2 _ x y y 2 

y~- 2 -2r 

a/ (7, fTy a-y 


we have, for (29) and (30), 


/8 a ^ (1 — ? ,a ) (tJ® — ^) a 

a <r y 2 - 2 rvo x <r y + v 2 cr x 


I(\) = z n i*\xle~*tt {ax '~ 2lix+y \ 


where x x and x 2 are the roots of 

aa? — 2 fix + 7 = X 2 , 

i.e. .(33). 

* Now if the ellipse (28) lies entirely in the positive quadrant, the index must lie 
between two positive values given by the gradients of the tangents to the ellipse 
from the origin. For these values xi and x% coincide, so that they are given by 

X 2 — e = 0. 

For all values of v lying between these limits, xx and x% are different and positive 
so that f3 > 0 , and (32) gives 


f*» 

I (X) = Zq I L 

J»x 




f) e " 5 !-»*(*“•) dx + p&e f) <&rj. 










434 Distribution of the Index in a Normal Bivariate Population 

The first of these integrals vanishes, since x% and x% are the roots of (33); putting, 
in the second, 

r-M‘-S’-* 


we find I (X.) 

or, by (33), 


VfH/.V 2 ’ 

/W-t 

o , ^f r V 1 - r * i 

/<x). a |, «-*■/. 7fir .-“ *. 


.(34), 


where Co is to be determined so as to make the area under the frequency curve of v 
equal to unity. 

Application tp Anthropometric Data . 

We return to equation (17). If x be large compared with <r mt h will be large* 

roo J 

rdingly, e “ dX will be negligible and so, a fortiori , will 

JA V27T 

r r. 

J A J A; 27T V1 — D 2 


acco: 


J -oo J 2 tt Vi — p a 

Hence the chance (7 of an index not less than v will be, approximately, 
J - A: V^7T j 


and 


v*-y 


}-^e-W'dY .(35). 

_ V2tt 

(<r„ 2 - 2'Vff x <r„ + v 2 ^ 2 )* 

Thus we have Geary’s result*, that if the ratio x/a x be large, then, approximately, 

vx — if 

(a y 2 - 2rva x <T y + v*a x 2 ft 
is distributed normally with unit standard deviation. 

Differentiating (35), we see that the equation to the index-distribution will be, 
approximately, 

dC < r y (x<r y - rycr x ) + v<r a (y<r x - rx< r v ) 1 
Ty 2 — 2 rva x a y • 


+ (v) ■ -^~x. - 1' * ,/-*v* m * 9 ++* m * 

fa (cTy* — 2 rV<T x <T y + V27T 


.(36), 


a form that can be deduced from (24) by neglecting the first term in the second 
member, and replacing the upper limit of the integral factor in the second term by 
— oo. 

* B. C. Geary: “The Frequency Distribution of the Quotient of Two Normal Variates,” Journal of 
the Royal Statistical Society , Vol. xom. 1930, p. 442. 






B. 0. Fieller 


485 


It is worth emphasising the conditions under which this approximation, is valid. 
It is easy to see that for some values of v the ordinate calculated from (36) will be 

negative; but if f -rLr dx vanishes to r decimal places, then in numerical 
J 5/<r x V 2 7T 

work proceeding to r decimal places or less, equations (17) and (35) will appear to 
supply exactly the same values for the frequencies; in other words, the negative 
frequencies furnished by (35) will be zero, to the degree of accuracy of our calculations. 

Now let us return to equation (34); it shows that the effect of neglecting the 
values of x , y that lie outside the ellipse (28) is to change the distribution of 

_ vx — y _ / c 

J f 'Jo?- 2rv<T 9 cry + v*<r* * V 1 
from a form that is sensibly normal to the form 

m = -L* e~ i“ 3 Co ' f^"* U e'~ *«* dt .(37). 


dt. 


The ordinate of the normal ’ distribution is thus multiplied by a factor that 
decreases with the length of the ordinate; the effect on the appearance of the 
curve of distribution of u, and therefore on that of the curve of distribution of v, 
will be to increase the areas near the mode at the expense of the tails. But this 
effect may easily be invisible, if X is at all large; if, for example, X 8 be eighteen 
times 1 — r 2 , no ordinate of the normal curve within a range ±3<r(<r=l) of the 
mean will be altered by much more than one-tenth per cent. 

Thus we have the somewhat startling result, that if x and y be positive, and 
large compared with their standard deviations, then the limitation of x and y to 
finite positive values can change the moments of the index-distribution from infinite 
to finite values, without having any visible effect on the appearance of the distri¬ 
bution. The paradox disappears, when we consider the difference between mathe¬ 
matical formulae and their numerical representation. Infinite moments cannot 
occur in the numerical applications of mathematical theory, any more than they 
can occur in experimental sampling. Any calculation of frequencies from the 
mathematical equation to a distribution will be performed to a certain number of 
decimal places; as calculated from the numerical frequencies, the moments will 
always be finite. When we say that the distribution has infinite moments, we 
mean that the numerical values of the moments do not tend to any finite limits, 
as we increase indefinitely the number of places to which we work; but this remark 
has no practical interest—it is, in fact, irrelevant. 

What the computer about to fit a curve wants to know, are the values that the 
moments would have, for the set of numerical frequencies calculated to the number 
of decimal places that he intends working to. In any discussion of anthropological 
data*, this number will be small enough to justify the use of (35) in place of (17), 


[* This Appears to exclude from anthropological data snch a character as corneal astigmatism, where 
the mean = *62 dioptres, and the standard deviation *86 dioptres; thus the character may be negative as 
well as positive. An index formed by the ratio of corneal astigmatism to distance of near point would 
need (24) rather than (36). Ed,] 




436 Distribution of the Index in a Normal Bivariate Population 


and the frequencies will appear to be those under (36), or, if \ be taken large 
enough, under (34). But since (34) is a distribution of limited range, the numerical 
values of its moments, as calculated from the numerical frequencies, will not differ 
appreciably from the values obtained by a direct mathematical process. This, 
I think, rather than any a priori rejection, as impossible, of zero or infinite values 
of the variates, is the true justification of Merrill’s method of approaching the 
subject*. 

If f and r) be the deviations of x and y from their mean values, so that 


.(38), 


# = % + 6 y - y + v, 

Merrill writes «- f—J = t (l + t) (l - f + I* - §* + ...) . 

id + f yf \ X or or J 

and takes for p n \ the nth moment about zero of the distribution of v , the mean 
value of 

'•-(D’H)"H + S-5 + ')'.«» 


to obtain this value Merrill retains the products % r rf as far as the eighth order and 
takes for theii mean values the product moments p r8 of the normal surface (6). 


This process would not be valid if we imagined it applied to the whole of the 
#y-plane, since the expansion (38) holds only if | f | < 5? ; but it is valid, if applied 
to the interior of any probability contour (28) that lies in the positive quadrant. 
In the case of low-order product-moments, we do not commit any serious error in 
taking for the product-moments of the curtailed distribution (27) the values derived 
from the whole normal surface ((>), so that Merrill’s values of the moments may 
legitimately be regarded as the moments of the index-distribution in a curtailed 
normal population, which are exactly what we want. 


Illustration . 

I have illustrated the preceding theory on some figures kindly supplied by 
Dr T. L. Woo from his measurements on the Biometric Laboratory’s series of Egyptian 
skulls. Table I shows the joint distribution of Zj(Zi) and P 2 (i)f, two measure¬ 
ments made, on the temporal and parietal bones respectively, in the left-hand side 
of 787 skulls. 


Taking P^(L) as x, T X (L) as y, we havej 

£ = 111 *207 433, y =* 86 019 060 
0-*= 5-7885-, <t v = 38453 
r «y ~ *173833 


,(40). 


* A. S. Merrill, “ Frequency Distribution of an Index when Both the Components Follow the Normal 
Law,*’ Biometrika , Vol. xx A . 1928, pp. 63—63. 

t For the precise definition of these measurements see T. L. Woo, “On the Asymmetry of the 
Human Skull,” Biometrika , Vol. xxn. 1980—31, pp. 326 and 827. 

t These constants, and those of the sample distribution of the index, were kindly calculated by 
Dr T. L. Woo. 





TJfBLE L Distribution of T x (L) and P 2 ( L ). 
r, (Z) (Central Values) 


E. C. FlEIiLKR 


437 


1 


| 181 

OS 

1 1 M 1 1 1 1 1 1 1 II 1 1 1" 1 1 1- 1 1 1 II II 1 1 1 1 1 1 1 II 1 1 1 1 1 II 1 

94 

OS 

1 1 1 M 1 1 1 1 1 1 I | 1 I M 1 II 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 ! II 1 1 

1 

Os 

1 1 i II 1 1 1 1 1 1 II I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 * I I 1 I | 1 I | l I 1 1 1 1 

04 

8 

II 1 1 M 1 M 1 1 1 1 1 II- II- 1 1 II 1 II 1 1 1 1 1- | II I I 1 I | I I II 

W 

3 

Os 

1 1 1 1 II 1 1 1 1 l~ 1 1 1 l«! 1 1 1 1 1 l« 1 I I - 1 1 1 1- 1 1 1 1 1 1 1 1 1 1 1 

L- 

s 

1111111111111-1111 I**- 1- 1- 1 I 1 II 1- 1 1 1 1 1 1 M II 1 1 

OS 

3 

M 1 1 II 1 1 1 ||* 1! ||" 1 II- l« I-- II 1--1 1 II II 1 i 1 II II 

s 

& 

Os 

1 1 1 1 1 1 1 ! 1-1 i 1 1- 1- 1— |«”»» — —» | — |w I — |-I . | 1 | | | |- 

04 

’-t 

OS 

||||||||||!|||||eQ»~^'MCe~^~<R«<NC9<N|'N|~<||||||||||| 

w 

§ 

| 1 1 1 1 1 1 1 I I I | -< PH fm, CO 0* to 04 *C ■* 04 iO 04 -t 04 — CO f « 04 i-t | r-« | | | | | | | | | | | 

04 

to 

3 

| | | | | | 1^1 1 | |<-|<-«'fO4tO''f.-4iO»OI-*CC94C5»OO4 94 04 94| | p- | f-i | r-( | | | | | | 

8 

§8 

| |^| | | | |^| | |o4|o4^94'*MCDl-X**W04r-tC0»«.-<r-l| | 94 04 fn | f* | | | | | | | j 

§ 


1 1 1 1 ! ! - | | I’-*’- 1 |-l jW'MWI'Ot-WPJOOOOMl'tNHNr.f- | ] pH | | | | | | | | 

Tf* 

r- 

8 

| | | | | | | | | j |h |/MrHrMB'tHa0«l-i0a)r*O'tnttfl(NMM(N | | | | | | | | | 1 j | 

os 

oc 

•* \ 1 ! I I |^| | | |cCr-.r-.-ftOlOXX-<lOl-tnCOt-M*H|04 94r-«OirHr*| | | | « | | | | j 

X 

I- 

9 

1 | | I I I I |*H|oi04|-fM04M®^|>|>CDi00401-1'»©«004|«4**| | | | | | |r-| | | | | 

!"• 

1^ 

8 

| | | | | | | | | | | ot |ni |h(M | | | 04 | | | | | | | | | 

s 

<50 

| | | | | - | | | | | | CO | | - | | | j | | | | | | 

lO 

3 

1 1 1 1- 1 1 1 1 1 1 1 l« |««« l«- |- M 1 1 1 II II 1 1 1 ! 

5? 

§ 

1 1 II II 1 II 1 1 |-« —« I — |«- |— |« | | | | | | | | | | | | | . | | | 

X 

8 

1 1 1 1 II- 1- 1 1- II 1 1 -|-«- 1 I- 1 1- |« 1 1 | 1 1 II 1 | 1 1 1 II 

o 

8 

1 II II I-- 1 1 1 II- 1 1 1 1 -- 1- 1 I 1 1 I-- 1 II 1 1 II 1- 1 1 1 1 1 1 

c 

I: 

II 1 1 1 1 1 I-- II- 1 I" 1 I-- 1 II 1 | 1 1 1 | 1 | 1 1 1 | 1 1 1 | 1 1 1 1 1 | 

CD 


1 1 1 1 1 1 1 1 1 1- 1 1 1 1 1 1- 1 1 1 1 | 1 1 1 1 1 | 1 1 1 1 1 II 1 1 1 II 1 II 1 

04 

8 

1 1 1 II 1 1 1 1 1 1 1 1 1- II 1 1 1 1 1 1- 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II II 

Q 

8 

1 1 1 1 1 1 1 - 1 1 1 1 1 1 1 II II 1 1 I- 1 1 1 1 1 1 1 1 II 1 1 1 1 1 • 1 1 M 1 1 1 

9 


10 Ui IQ 10 3 irj *0 »0 *0 3 JO to ^ ^ *0 ^ V* W ^ ^5 3 l O »0 ^0 ^ JO 3 «5 ‘0 «S «5 *0 *0 ‘0 ^ US «S 

N N N N N N N N N N N i's N N N N N N N N N N N N N N i> N is is N N N }s is N is is is. i- Js 

os 6 ^ ‘iNooosofsi'liOj bNooa'b’Nii&j r o i- io os 6 ji Oi ■jt-jfs 

OOOsOsOSO>OSOsO>OsOsOSOCi^O , w>'j5 , 0 , ^C)C>^ , *H'^’ s i’^ T ^^’N’^t H i»i^>^i^^i®5§ifei^i®l^® l 5*08»5 

3 


Biometrika xziv 


(89tl[« A l«4?O90) (7) V 
















438 Distribution of the Index in a Normal Bivariate Population 


TABLE II. The Index Distribution. 



The distribution of the index « = Ti(L)/P t (L) in the 787 skulls is shown in 
the first column of Table II. 

Let us assume that P t ( L) and 2i ( L ) are distributed in a normal surface whose 

f 00 1 

constants are given by (40)*, We have 19*22, so that I dw 

j5/<r*V27r 

* Actually we have for P a (L) : ft = *0002, ft=8*5340, 

for Tj (L); ft = *0064, ft=8-1576. 

From E. S, Pearson's table of the 5 °/ 0 and 1 °/ 0 points of the distribution of ft and ft (Bfomstrifca, 
Vol. xxii. 1980—81, p, 248; or Tablet for Statisticians and Biometricians , Yol. n. Table XXXVII bis), 




E. C. Fieller 


489 


vanishes to something like 80 places of decimals. The frequencies of the theoretical 
index-distribution (24) may therefore be calculated from (35); they are given in 
the second column of Table II. (Their calculation can be effected very rapidly, by 

. vx ~~ t? 

first forming a column of the values of — - --- .-corresponding to the 

(<r*-2rv<r x <r y + <T*)i ^ 6 

boundaries of the frequency groups, and then interpolating into tables of the normal 
curve.) Combining the tails of the theoretical distribution, above *895 and below 
*675, into two single classes, and grouping together the frequencies whose centres 
are *88 and *89, we find ^* = 25*1504 and P**290, so that it is quite likely that 
the sample shown in the first column is a random sample from the parent population 
shown in the second column in Table II. 

In Table III are sk/wn the constants of the index-distribution, calculated 

(i) from the observed distribution, 

(ii) from the numerical frequencies given for the theoretical distribution, 

(iii) from Merrill’s formulae *. 


TABLE III. Frequency Constants of the Index-Distribution. 


Frequency 

Constant 

(i) Col. 1, Table II 

Calculated from 

(ii) Col. 2, Table II 

(iii) Merrill’s formulae 

Mean 

•77469 

*775296 

•775298 

S.D. 

*048280 

•048646 

048664 

0i 

*1111 

•0506 

0511 

03 

3*6296 

3*1073 

3*1196 


It will be observed that the agreement between corresponding entries in the 
last two columns is quite satisfactory. 

If we substitute in equation (24) the values of x, y, <r xt cr y , and r given by (40), 
we find that throughout the effective range of the distribution of v, the first term 
in ^fr(v) is less than c*"* 00 , while the upper limit of the integral factor in the second 
term is in the neighbourhood of - 27. These figures indicate the extreme accuracy 
with which the index-distribution is represented by (36). Substituting in that 
equation, and multiplying the second member by 787, the size of the sample, we 

find for the equation to the index-distribution 

1 1 (86*01906-lll-20743v)* 


V - 


1032191 + 192966 ^ 


(14*7867 - 7*7386v + 33 5067 v 2 )* V2tt 


- e 2 14*7867 - 7*7386 r + 33-5067 v* 


.(41); 


the Pearson Type IV curve,.fitted from Merrills formulae for the moments, is 

y « 10 -2 ° X 7-83548 (1 + 5'3695 1 **) ' 73-28181 ,191-84936 tan-' 2-31722* ,(42). 


we find that while the values of ft are not significant of any departure from normality, ,i s exceeds 3*80 
in leee than 5 °/ 0 and 8*48 in less than 1 °/ D of random samples of 787 from a normal population. It is 
accordingly very unlikely that P 3 (L) is distributed normally, bat it wiU be seen that the departure from 
normality doee not seriously affect the goodness of fit of the index-distribution. 

# Loo. ciL pp. 58, 56, 67. 


28—2 





440 Distribution of the Index in a Normal Bivariate Population 


The ordinates of these two curves are shown in the last two columns of Table II; 
the agreement between them is practically exact. The present example would 
appear to indicate, therefore, that there will in practice be little difference between 
the conclusions suggested by the two theories. Nevertheless, once it has been 
assumed that the components of the index are normally distributed, it seems 
definitely preferable to use the index-distribution (24), and its probability integral 
(17), implied in this assumption of normality. We thereby retain a consistent 
mathematical theory, by avoiding the extraneous assumption that the index- 
distribution can be represented by one of the Pearsonian curves; moreover, 
calculating the ordinates and frequencies for Geary’s approximation to (24) is 
considerably less laborious than doing so for a Type IV curve. 

It will be observed that while we get a fairly satisfactory fit of the theoretical 
to the observed index*distribution, the latter is more leptokurtic than the former. 
It is therefore to be expected that the fit will be improved, if we assume that T\(L) 
and P*(Z) are distributed in a curtailed normal surface such as (27), so that the 
distribution of the index becomes of the type (34). Actually, however, we find that 
for the outlying individual for which Ti(L)= 92, P 2 (£) = 134, 



&X Gy \ Gy / 


157947, 


so that unless we are to reject this individual as a pathological anomaly, we must 
take the A. 2 of the limiting contour (28) at least as large as 16. The following table 
indicates the sort of value that F, the integral factor in / (X) (equation 34), assumes, 
if we adopt this value for \ 2 : 



•625 

•635 

•645 

•655 

•695 

•735 

•775 

[2F: 

•961 

•986 

•994 

•997 

•9998 

•99993 

•99995 

v: 

•955 

•945 

•935 

•925 

•885 

•845 

•805 

2 F: 

•980 

•989 

•994 

•996 

•9994 

•99986 

•99994 


These values make it clear that there will not be any significant difference between 
the ordinates deduced from (34) and those shown in Table II; in other words, there 
will be no appreciable improvement in the fit. 


I have to thank Professor Pearson for his advice at several points in the course 
of this paper, and for lending me the manuscript of an unpublished lecture by him 
on Dr Merrill’s work *. 


* [The purpose of the leoture referred to lay in pointing out from a number of examples, whose 
components were even more nearly normal than those of Woo’s case, that the /S 2 ’s of index-distributions 
were very considerably in excess of their theoretical value as .deduced from Merrill’s process. 
Mr Fieller shows that Merrill’s method and his own lead to results in fair accordance both in the dis¬ 
tribution of frequency and in the values of the constants. Assuming the theoretical values of both methods 
to give A = -10 and ft = 3*1, the standard error of /9 2 for a sample size 787 is '2395. Thus the observed 
has a deviation from the theoretical value of 2*18 times its standard error, and roughly this would 
indicate a probability of less than *02 of the observed & being due to a random sampling from the 
theoretical population. Thus the difficulty discussed in the lecture is emphasised rather than Bur- 
mounted by a method which gives constants in accordance with Merrill’s results. Ed.] 



A NOTE ON THE DISTRIBUTION OF THE 
CORRELATION RATIO. 

By JOHN WISH ART, M.A., D.Sc., Clare College, Cambridge. 

Introduction. 

The sampling distribution of the correlation ratio may now be said to have 
been determined for tlL*ee different cases, in all of which the arrays of a variable, y, 
are normally distributed with a common variance in the population sampled. 
Suppose that we reserve the symbol rf for the population value and denote by E 1 
the square of the correlation ratio calculated from a sample. That is 

S(y-y) 2 ) .(1). 

The number of arrays is p, the ith array having observations; y* is the mean of 
the observations in the ith array, and y is the general mean. The first summation 
is over all arrays, while the second is for all the observations of the sample, 

i.e. from 1 to N, where N = ^ (w*). 

i 

Then it is known that if if be zero*, the distribution of E 2 is 

d/ - (£,) ‘ (1 - ^ dm . <2) ' 

a form which shows the identity of the form with a general class of distributions 
having symmetry in n\ and n 2 , interpreted in this case as the numbers of degrees 
of freedom between and within arrays, so that 

m~p — 1, n 2 ~N—p. 

If rf be not zero , two separate cases arise, for both of which solutions can be 
derived from the two distributions of the multiple correlation coefficient given by 
R. A. Fisher f. 

Case (a). Here it is to be supposed that the conditions of sampling are such 
that the array totals, n*, vary from sample to sample. The sampling distribution 
is then given by Fisher’s series (A), writing 

R 2 **E 2 , p 2 = rf> tti=*p-l, n 2 =*N—p, 
provided that the expectations of y for the values of x in the sampled population 
are normally distributed. This distribution has been studied at some length by 
Fisher in the paper cited, and in particular the probability integral for n* even 

* B. A. Fisher: Joum. Roy. Stat . Roc. Vol. lxxxv. 1922, p. 605. The distribution was also deduced 
at a later date by Hotelling: Proc. Nat. Acad. Sc. Vol. xi. 1925, pp. 657—662. 
f R. A. Fisher: Proc. Roy . Soc. A, 121, 1928, pp. 654—678. 





442 


A Note on the Distribution of the Correlation Ratio 


was expressed in finite terms. As it was thought that the mean and second moment 
coefficient of this distribution were of some interest in themselves, these quantities 
were later determined for even, and the results inferred to hold generally *. 
Such results can readily be translated into terms of the correlation ratio. 


Case ( b ). Of more practical interest, however, is the case where the number w< 
in each array is supposed the same for all samples. The distribution of E % is then 
that of R 2 in Fishers distribution ( C) } writing 

*£* = - i“i’ =/>-!. rt t = N-p. 

Fisher did not study the properties of distribution (C) y and the object of the 
present paper is to obtain the probability integral in finite terms, and in addition 
expressions for the mean and second moment coefficients, developing the distribu¬ 
tion as that of the square of the correlation ratio, although the results can 
readily be translated in terms of any variate following the same law of distribution. 

Further, both Fisher’s (A) and ( C) distributions were shown to tend in the 
limit as the size of the sample was increased to a third distribution ( B\ and it 
will be shown how the results derived in this paper, equally with those previously 
obtained from the (A) distribution, tend in the limit to the corresponding para¬ 
meters of the distribution ( B ). For this limiting form we have 

= ^2?A 1P = ti 2 E 2 . 


The ( C) Distribution. 


Changing the notation as explained above, and denoting by a, by b, 
E* by x and by t to simplify the mathematics, the distribution of x takes the 
form 


c//=e-« 


(q + 6 -1)! 
(a-l)l(b — 1)! 


^(l 


- a ;) 6 ' 1 


1 + 


a -f b 
1 ! a 


(a-f b)(a + b + 1) 
2!a(a + l) 


(tx? + 


dx .(3). 


Since and n 2 are necessarily whole numbers, a and b may be integers or half 
integers, but the factorial sign is used in either case, i.e. x\ denotes what is 
generally understood by T (x -f 1). The series within square brackets is a confluent 
hypergeometric one, and may be denoted by F(a 4- b, a t tx). Now by an application 
of Eummer’s formulaf, we find 


F (a -f* by a, tx) = e te jP(- b f a, — tx) y 
giving, when b is an integer, a terminating series of the form 


l + 




* J. Wiflhart: JHomftrika t Vol. xxn. 1981, pp. 858—861. 
f Whittaker and Watson: Modem Analy$i8 y § 16.11 (2nd edn. p. 




John Wishabt 


448 


At this point we shall suppose b to be an integer, returning later to a consideration 
of the other case, when b is a half-integer. Noting that 

- ( “ “I/, <- 

the distribution (3) may be written in the comparatively simple form 

df = (1 - xr'f™ {x)dx . (4), 

where f(&)~aP+*~ l e im f and f ih) (x) denotes the 6th differential coefficient of f{x) 
with respect to x. 


Probability Integral. 

In this form it it Sasy to evaluate the indefinite or probability integral of the 
distribution. The range in (4) is from 0 to 1, and since / (f) (0) = 0 for all values of 
r from 0 to 6 — 1, the integral of (4) from 0 to # may be written down directly. 
In fact, 

MV 

JQ 


e-tl'Q --££/«(*) 

r—0 r\ J 


(o + b — 1)! 


2 

r=0 


F (—r y a + b 


-r, -tx )| 


|r!(tt + 6 —1-r)! 
involving a series which terminates in £6(6 + 1) elementary terms. 
Now by Taylor’s theorem 

/(•+*)- 


•(5). 


Put h «■ 1 — x> arid we see that (5) involves the first 6 terms of an expansion in 
Taylor’s series, of which the complete series is 

/(«+! -x)~f(l)**e i . 

When, therefore, we are interested in the “ tail ” of the distribution curve, as when 
we wish to find a value of x for which the proportionate area under the curve 
beyond the ordinate at x is 0 02 or 0 01, say, we may write the probability integral 
in the form 

(1 ~* ) 7 ,r, (*) .(6). 

r~b 

If it were desired to extend Woo’s table* for values of rj* other than zero, it 
would be necessary to solve equations for x of the form 

| (AzflT gir) t x \ _ 0-02 and 0-01, 
r=i r ‘ 

where g {r) (x) * e“*/ (r) (x) (x***- 1 er f 


* T.L. Woo: Biome triha, Vol. xxi. 1929, pp. 1—66. Tablet for Statisticians and Biometricians , Part II, 
1981, pp. 16—72. A table of the author’s, in Quart . Journ . Roy. Met. Soc. V ol. uv. 1928, pp. 258—259, 
gives the 0*05 and 0*01 levels of significance, and extends to 7 arrays and a size of sample of about 100. 
It 'covers a range below 50, which is not covered by Woo's table. 







444 A Note on the Distribution of the Correlation Ratio 

For given size of sample and number of arrays, and for a given value of rf } this 
would give an x (or E 2 ) beyond which there is only a 1 in 50 or a 1 in 100 chance 
of a value occurring in samples from a population with this value of rf, At best, 
however, this computation will be a long and laborious business, for it does not 
appear that any essential simplification is possible in the expression of our results 
(5) and (6). 

For the special case of samples from uncorrelated data, Woo’s tables provide, 
in addition to the approximate 0*01 and 0*02 probability levels of significance, the 
mean value and standard deviation of the square of the sample correlation ratio 
(our x or E 2 ). We shall proceed now to determine these quantities in the general 
case, still on the assumption that 6 is an integer. 


Mean value of E 2 . 

Beginning with the form (4) of the distribution, let us multiply by x and 
integrate from 0 to L We have 

W - £ ( y^y, * (l - *t~ x d {/<>-” (*)} 

■ I,! (T-l)! {(1 “• r)6 ~ 1 -< 6 - 1 )®( 1 -(«) dx, 

on integrating by parts, 

~ 1)*(1 (x)d.r, 

since the term between limits vanishes. Continuing this process we have finally 
& = £ ( yrry, {- (& -1) (b - 1)! (1 - 0 ) + (b -1)! x)f (w) dx 
= e~t f l {l -b{\-x)}d{x<»*- 1 e tx ) 

JO 

— 1 — be* 1 |" .T afb-1 t ix dx .(7) 

on further integration by parts. An examination of the integral in (7) shows that 
when ^ = 0 we have 

£2—1 _^ __ a ^ n x _ jp — 1 

a-ft a + 6 ni+ri2~N— 1* 

agreeing with the result deduced directly from the special form (2) of the distri¬ 
bution when if = 0. On the other hand when t becomes very large the important 
part of the integrand is whose integral from 0 to 1 is (e*-l)/t, so that the 
second member of (7) behaves like 6(1 — \^hich tends to zero as t tends to 
infinity. Thus when rf « 1, we have E 2 = 1. 

Ihe integral in (7) may be evaluated in series form in two ways, according to 




John Wish art 


445 


whether t is less or greater than a + b. In the first case, expanding the exponential 
and integrating term by term, we have 

fi r cc+ r *' a, 0+6+1 t a x"*’** I 1 

Jo H- b a + b + 1 2! a 4* & 4* 2 Jo 

as- F (a + b t a + & 4 1, £) 

a + o v 


a 4 - 6 


^(1, a + b 4-1, — 0 


by Kummer’s formula, F being the confluent hypergeometric series as already 
defined. Thus, in general, 

* E*= 1- ~F(\,a + b + l,-t) .(8). 

CL 4* b 

The series F is of the form 

t . __ fi 

ci 4* b 4* 1 (ci 4* 6 + 1) (u 4 b 4- 2) 

and can be readily evaluated when a 4- b 4- 1 is large compared with t . When t is 
large, however, or when a + b is small whatever the value of t be, it will be found 
best to transform (8), a process which is best carried out by returning to the 
integral in (7) and integrating by parts. We have: 

a + h ~ 1 V 4* Ax 

Jo 


l r 1 p 1 

t Jo t 

-?[*- 


t 


a + b -1 , (a + b— l) (a + b —2) (- l)«+*-i< a + & - l)f .1 

~~ t — + fi -- + -jHra-a-oj 

when (a 4- b) is an integer. When (a 4- b) is a half-integer, the other possibility, the 
corresponding form is 

a + b — 1 , (a4& — l)(a + ft-2) j (— l) a+d “^(a4-6— 1)... 

£a+&-5 


T a + £> —1 (a4& — l)(a + ft-2) (— l)^(a + i»-- 1)... §' 

< ‘ + * 2 ‘ t a+b ~* 




Putting x = w 2 in the final integral, the series becomes 


,t " 

n l ~ 


a + b — 1 (a 4 b — 1) (a 4* b — 2) 

. t ~ + " 


We therefore have 

£•=1 




_4 1*1 — — 

~t |_ l 


+ 6 — 1 (a + b — l)(a + b — 2) 
. ^ 


<* 

(_i)«+6-x( a+ 6_i)r 

^a+b-1 


' (1 — e -< )J (a + 6) an integer 


• 0 ). 


+ {—~ ^ ^1 — er* I e tui du^ (a + b) a half-integer ...(10), 






446 A Note on the Distribution of the Corrdation Ratio 


The series in ( 9 ) and ( 10 ) consist of a finite number of terms, from which the 
value of E % may be readily computed. When t is large compared with (a + 6 ), it 
may happen that the terms become negligible before the last, or remainder term 
is reached. If, however, this last term has to be computed, there is no difficulty 
with ( 9 ), but with ( 10 ) we have to consider methods of evaluating the integral. 
Let t (1 — w a ) = x. Then 

f 1 If* e~«dx 

e -U\-ut)d u = -- < 

' a Zt f 


.'0 


2,1 V 

If we now expand (1 — in powers of x/t, we get 
1 

2 t 






i [1^*+ij ... + 




r!(2f) f Jo 


...] 


= It [ 7 ( 1 . 0 + (2 \)7(2, t) + ... +— y(r + l,t) + 

where y(r + 1 , t) is written for the incomplete gamma integral 


4 . 


r< 

.'0 


x T e~*dx. 


The (r 4 - l)th term of the series in square brackets may be written 


%= ~ r 7(» , + 1 . 0, 

v 7 r. r ! t r 


r«v/p+ l 1 

or in terms of the Incomplete 0amnia-Function I(u,p) = j v p e~*dvj p\ we 

f e ‘u*d M = -L 1 < _ >r )l 

. 0 2£ V 7T r —0 t r Wr + 1 /) 


have 


which can be evaluated from the Tables of the Incomplete Y-function*. 

Since the complete integral /( 00 , r) is equal to unity, we see that the integral 
in ( 11 ) becomes 

i [1 + 0 (1/0], 

which tends to zero as t tends to infinity. This caaalso be inferred directly from 
( 11 ), for in the second integral the important part of the integrand when t is large 
is and the integral therefore tends to* resemble 

(1“ 0/(20, 

which tends to zero as t tends to infinity. 

The value of ( 11 ) in direct powers of t is the uniformly convergent series 

^JL rY(2r+l ) m(r ' F 0t> h *)-r(l. |,-0. 


H.M. Stationery Office, 1922. 




John Wishart 


447 


bat as computation would only be feasible from this series with t of the order Of 
unity or lower, it would be better to go back to a direct application of (8), especially 
when it is remembered that small samples are usually of little value, and values 
of (a + b) of less than 20 to 25 will be of little practical importance. 

Equation (9) is, of course, a direct consequence of the application to the 
confluent hypergeometric series in (8) of a well-known formula 

, fi , (l-a)(7~«) . (1 -a)(2-a)(7-a)(7-a + l) ) 

+ (a-l)r^ t + * 2!^ 

where in our case & * 1, 7 = a + &+ l,a?=: — t. This relation, however, breaks 
down in our case when (a + b) is a half-integer owing to the second series becoming 
imaginary. Certain tables of the confluent hypergeometric series have been com¬ 
puted by Airey*, but they are of no use to us in the present investigation, being 
only calculated for values of our (a + & + 1) equal to 1, lj, 2, 3 and 4, and for 
positive values of x (i.e. negative values of t). 


Second Moment Coefficient of E 1 . 

Returning now to the form (4) of the distribution, we shall multiply by x* and 
integrate from 0 to 1. We find 

= [^-gCjy^d-^d [■/*-" (*)} 

I)! (2*0 -*r- l -(b-l)a*(l-xr-*}f"-»(x)dx 

- f* (2 (1 - xf- 1 -- 4 (b - 1)a (1 - «)M 

+ (b - 1) (b - 2) ** (1 - x) b ~*\ /<*-*> (x) dx 

on integrating by parts. The part taken between 0 and 1 vanishes at these limits 
in both cases. Continuing the process, we finally obtain 

m - f ’ (h CV) j <* - 1 ) (fe - 2 > <* ■- 1 ) ! ( 1 - *t 

— 2 (1> —1)(6 — 1)! *(1 — *) + (5 — 1)! «*}_/' (x) dx 
= f V* {£ (6 - 1) (6 - 2) - 5(6 - 1) a + \b (6 + l)* 1 ) d(a«+*~ 1 e te ) 

Jo 

= J (b — 1) (b - 2) - b (b — 1) e~* e te j + b (b — 1) «“* J". x^^e^dx 

+ \b(b +1)<r* *-6(6 +1) e~* 

> = 1 + 6(5 — l)e~* f x a +*~ l e* m dx —b(b + l)tr* [ x^e^dx .(12). 

* f Jo Jo 


J. B. Airey: British Association, Reports tor 1926 and 1927. 




448 


A Note on the Distribution of the Correlation Ratio 


If in (12) we put t » 0, it becomes 




6(6-1) ft(& + l) n q(g + l) 
a + 6 ci *+* 6 + 1 (a + 6) (ti + 6 4* 1) 
n i (r?i 4- 2) _ p 2 -1 

(«i + ft*) (wi + ft* + 2) JV' 2 - 1 * 


agreeing with the direct evaluation of the uncorrected second moment coefficient 
from the special form (2) of the distribution when vf = 0. On the other hand, when 
t is infinite the same considerations as were taken into account in the determina¬ 
tion of the mean value of E 2 show that the second and third members of (12) 
vanish, and we have /^'(i? 2 )* 8 1 when ?/ 2 = 1. 

Evaluating the integrals in (12) by expansion of and integrating term by 
term, we obtain 

ft' (W) - 1 + b ~PF( 1, a + b + 1, - 1)- F( 1 , a + b + 2,-t) . ..(13). 

Now 


-b(b + l) 

6(6 + 1 ) 


^Tb F(l, a + 6 +1, - 1)- ( ^{ + 1 F(l, a 4 b + 2. -«)] 

_ _b 2 

(a + 6) ! 

JF(2, a + 6 + 2, - *) - (1 - E 2 ) 2 .(14). 


,[*■( 1 , 0+6 + 1 ,-*))* 


(a + 6)(a + 6 +1) 

This may also, if desired, be expressed as 

6(6 + 1 )^ 

a + 6 dt 


n 2 __ n „ 2 - 

a A V 1 A ^ > 


where F represents the series jP(l, a + 6 + 1, - 1), i.e. the same series that occurs 
in the expression (8) for the mean value of E 2 This series will have been calculated 
in any case to obtain E 2 , either directly or by means of the forms (9) or (10), and 
if a table were to be prepared of the function F it should bo accompanied, for 
completeness, by a table of its derivative, which could either be calculated directly 
from the confluent hypergeometric series in (14) or in terms of the differences of 
the table of F. Direct calculation by (14) involves computation from the series 


1- 


2 1 




a 4- 6 + 2 (a + 6 + 2) (a -f* 6 + 3) y 
and will not be feasible if t is large compared with (a + 6). If this is the case, the 
simplest way to get the appropriate form for a 2 E * is to differentiate the series in 
(9) or (10) with respect to t We have 


F = 


a + 6 I 


'l + a+ |“ 2 (^4 L 6* 

r -1 


1)... (a + 6 —r) (£+6-1)1 I 

+ (- 0**- 1 ( 'J 

(a + 6) an integer, 


(-o r 


a + 6f- , ® + 4“ t (® + 6-l)...(a + 6-r) 

■“L + ,i, -F W - + 


(a + 6) a half-integer. 




John Wishart 


449 


Whence 

dF_ a + &[\ , a+ tr l (r + l)(a + b-l)...(a + b-r) , (a + b- . a + b\] 

dt~ v l r=i Rr + - ( _t)^r <r v 1+ -rjj 

. (a + 6) an integer .(15), 

a + 6f, , a+ 4 _, (r + l)(a + 6-l)...(a + 6 —r) 

L + l, r-w 

+ j 1 " ( 1 + Jrb) e ~'J‘ •“<*“}] (» + 6) a half-integer ...(16). 

Case when b is a half-integer. 

So far the results for the probability integral, and the mean value and variance 
of E z f from the distribution (3) have only been proved for the case when b is an 
integer. The series F(— b, a, — toe) was then a terminating one, and it was possible 
to express the distribution in terms of the 6th differential coefficient, with regard 
to x s of the function f{x)^oc^" l e tx . When b is a half-integer, the other possibility, 
it will be found convenient to use the theory of non-integral differentiation*. The 
appropriate theory may be briefly outlined as follows: Beginning with the function 
/(#), let us consider the operation of integrating it repeatedly between the limits 

0 and oo,. The first integral is /(£) df. We have then 

J o 


([J /(£) = j # d v j o / (£) 

= (%-*)/«)#• 

JO 

Repeating the operation, we have 

Q’ At) df - [dy J \y - *)/(*) d£ 
= ]/(£) (x-y)dy 

-J^ir'<»« 

In general ((’) /<{)<<?'-J.-jjrijr/tfW 


Putting n - we have 


This expression may be regarded as the definition of the £th order integral of 
the function f{oo). 

* I am indebted to Professor Norbet Wiener for this suggestion and for references to the literature 
of the subject. 




450 A Note on the Distribution of the Correlation Ratio 
To apply this result let f (x) — as before. Then we may write 

. m 

where h (x) denotes the function within square brackets. 

Returning now to (4), the distribution of x may be written, formally at least, as 

df = ^ ( x ) .( 18 )> 

and b -f £ is now an integer. 

Nature of the function h (x) and its derivatives. 

Putting v * %/x in the integral of (17), we have 

1 (xta+b-l^ *«+&-* f 1 

— tfg = —vy-te^dv, 

V7rJ0V/r — f V7T .'o 

ie. h («) = <“ ,*•-* J (tt + b,a + b + i. tx) 

- a + b + J, -te).(19), 

by Rummer's formula, F denoting as before the confluent hypergeometric series. 
In the first of the two forms given (19) may be differentiated repeatedly without 
much difficulty, and we have the results 

h'(x) = [“ + f b _ ^-iF(a + b,a + b-l,tx) .(20), 

and in general 

h (r) (*) = (a + b, a + b + £ - r, tx) 

= (o + izri-iy. ^-*- r ^^(i -r.a+b + l-r.-tx) .(21). 

Putting r = 6 4- i and substituting in (18), we get immediately the form (3) of the 
distribution. An alternative form for h ( x ), useful for computation purposes when 
t is large, may be obtained by putting x - £ = u*/2t in the integral leading to (19). 
We then have 

y ~2 r s/iix / 1/8 \ o+b-i 

('-&) 

On expanding (1 — tt a /2te) 0+ *- 1 by the binomial theorem, and integrating term 
by term, we have the result 

h(x) = — x^*~ 1 e tx j»io (V2i tx) - a ** ”** ( v/ 2te) 

, 1.3.(o + 6- l)(a + 6-2) ,.rsr s 1 /4M1X 

+ -21(2 Uf - 7 m 4 (v2te) — •••J ••••••(**). 








John Wishabt 


451 


where tn^VIto) denotes the (2r)th Incomplete Normal Moment Function 

m tr (^2tx) — : f u*'e~ iut du/(2r — l)(2r — Z)...l, 

\2orJo / 

__ 1 ['JtoX 

m 0 (v 2 tx) * du. 

V 27 T Jo 

may therefore be obtained from Sheppard's tables*, and the even m’s required 
are tabulated up to m M *. The upper limit of m 1tr (^2tx) t when t becomes infinite, 
is in all cases 0*5. 


Probability Integral. 

Returning now to* 18) the probability integral is 

1 - ( t=tj, j* ( :1 “ ^ hM) (*> dx - 

This may be integrated successively by parts, putting h M) («) dx — d (as)}, 
etc., and we find 




er* \*h'{x)dx 

7T Jo VT~^- X 


,* fc + 6-I-,W r I); ^(4 - a + 6 + * - r, - te)j- 

e~ l 


.(23). 

7T JO V 1 — X 


%/ 7T 

on substituting for h^ r) (x) from (21). This is the analogous form to our result (5), 
and may be completed by evaluating the integral in (23). 

h! (x) may be written, from (20), 


h 9 (x ): 


Consequently 


oo | 
V J 

r«0 V. 1 


<“ +l+rrJl ! £«*** 

(a-f 6 7— §)! r! 


-}• 


f* A' (a?) da? _ er* % 

V7rJoVl —a? V7T r=0 

oo 

r» 0 


i' a - ,«4 

(a + 6 + r-f)! r!Jo j 


I m (a + b + r 


~b £)}. 


where J B (p, g) is the incomplete Beta function 


(p+tz }') 1 r 
(p-l)!( g -i)!j 0 


a-p-i (1 _ x) q ~ l dx. 


The required integral can be expressed, therefore, in a simple series of incomplete 
Beta functions, which can be obtained for small values of (o + b ) from the tables 
prepared in the Qalton Laboratory, and now at press f. For the special case of 


• Tablet for Statistician!, Part I, pp. 2 and 23; Part n, p. 147. 

t 8m also Biometrika,\6l. xxn. pp. 274—288, and Tablet for Statisticians, Part II, 1981, Tabl* XXV. 




452 A Note on the Distribution of the Corrdation Ratio 


p or q equal to the incomplete Beta function can be expressed in terms of the 
symmetrical integral as follows: 

i»(p, &) = 1 — 2/«(2p — 1), 

IT 

where h (2 p -1) = j* cos j |2 J* cos^-^dtfj , 

and 6 = cos -1 Vx. The best method of evaluating L(2p- 1), in the absence of 
tables, is one I gave some years ago* in terms of a series of powers of 1/(2 p -1). 

When x=l, I 1 (pq)=l, and therefore f — ~)^fr = er* 2 = 1. 

V7rJo vl - « r- 0V»/ 

Mean value of E\ 

Taking the form (18) of the distribution, let us multiply by x and integrate 
from 0 to 1. The procedure is exactly similar to that in the earlier part of the 
paper, where b was considered an integer, and the part taken between the limits 
0 and 1 on integrating by parts vanishes in every case. Finally we are left with 


E* - ,yrT7i f i~ ( b ~ 4 > (* - 1) «> ~ 2) • • • § (1 - 


+ (6 - ]) (6 — 2)... (1 - h! (*) dx 


J»( Vtt (1 — a')) 

er l (f l — (2ft — 1) lb (x) dx f 1 2 bxh' (x) dx 

J i) 


- if 

V 7T Uo 


Vl - X 


V 1 — X 


e~* f* It (x) dx 2be~ l [ x 

V;J.Vr-r" 


V' 


7T .0 


= 1 - 'j—- Vl - xli (#)dx 

26 e~* ® |(a_+ft + r-l)!r 
Vtt r to|(a + 6+r-f)!r! 

oo yr 

. 1 — b e -t V_ 

r tTo r! (« + + r) 

hp—t 

- 1 - a + j F(a + 6, a -h 6 +1, 0* 



Finally 


J£ a = l 


6 

a + b 


F( 1, a + 6+ 1, - f). 


from (20) 


Thus our result (8) for the mean value of i? a is shown to hold in general for all 
values, integral and half-integral, of b. The alternative forms (9) and (10), for 
large t , follow as a matter of course. 


* Biometrika , Vol. xvn. 1925, pp. 469—472. See Tables for Statisticians , Part II, 1981, pp. ocxxi 
and 289. Note that this formula U applicable generally for any value of N (our 2p -1) and should be 
regarded as superseding the former method and Table XLV. 




John Wjshart 


453 


Second Moment Coefficient of E 2 . 

We repeat the former procedure of multiplying by a? and integrating succes¬ 
sively by parts, stopping when we reach an integral involving h r {&). The parts 
between the limits vanish, and we have 

#••'<**)- f‘ {(b - t)(b - 1) (b -1) (b - 2)... f (1 - 

- 2 (b - 1) (b -1) (b - 2) ... (1 - *)»-<*-*) 

+ (b -1) (b - 2 ) ... £(1 - «/-<*+»>) h' («) dte 

= er* f (!-*)»- + 

JO l 2 V7T V7T J 

SvttJo vl — x 

-i-Tf Vl-#A'(a?)<ir+£6(6 +1) [ (1 - x)*h'(x) dx\. 

V7r L-'o vl — # Jo Jo J 

The first and second of these integrals have already been evaluated, while the 
third may be obtained directly from (20) by a suitable modification of the pro¬ 
cedure already outlined. We then have 

(E>)-1 -26^. i r[( .^ + - ) + »(» + *•)«- S,^+>7 <f+ 6+7+T) 

‘ 1 -a+V'^ 0 + b - a * h * ’•- +H2 ' *> 


= l- 


26 


^ F(l, a + 6 + 1, — t) + 


6(6 + 1 ) 


. jF (2, a + 6 + 2, — tf). 


a + 6 x ' " ’ ' ' ’ (a + 6) (a + 6 + 1) 

This, though not of precisely the same form as (13), is seen to be equivalent to it 
when we derive aV» for subtracting the square of the mean 
_ Oh h 2 

W=l--^f(l,a + 6+l,-04- ( ^^^(l,a + 6 + l,--t), 
we see that 

6(6 + 1 ) 


o V = — 


:F(2 9 a + 6 + 2, -t)-(l-E 2 f, 


{a + 6) {(i + 6 + 1) 

which agrees exactly with (14). This result, therefore, holds generally for the 
variance of E 2 whether 6 is an integer or a half-integer. 

Limiting valves of Mean and Second Moment of E 2 . 

In conclusion it will be of interest to show the identity of the limiting value 
of the expressions (8) and (14) with the corresponding moments of Fisher's dis¬ 
tribution ( C )*. To establish this we require to find the limits to which »he moments 
of bE 2 tend as 6 is made indefinitely large. 

No. *-1 -(l + ff‘ {l - 5 (l + if 1 )" + £ + 0 (J)} from (8) 

_«+i-£±£±‘ + 0(£) .(24). 

' * Proc . Roy . Soc. A, 121, 1928, p. 668. 


Biometrika xxiv 


29 




454 


A Note on the Distribution of the Correlation Ratio 


Hence 


Lt bE* ** a 4* t, 

oo 


Therefore 


•v - (» + {) (i ♦ r (> - w - ?(>+«©} 

- (1 - tf*)P from (1*) 

= 1 - * + *) + + 0 (i) - (1 - F»)». 

(1 - F) a = 1 - 2 ^ + 0 (p) from (24). 

efore <rV = + 0 (p) , 

Lt O^hK* ® ^ *f 2£ 


Now consider the (0) distribution of Fisher. Writing = and 

£wi»a, it takes the form 


' ( a ~ 1)! 
which wc may put in the form 


r-* fl + -- + 9 y 

a 21 


(tx? __ , ) 

a(a+l) ' 


«*/- (f) ^ dx, 

where I denotes the Bessel Function of imaginary argument. 

The range of x in (27) is from 0 to oo , and the integral over this range is unity. 
The moment generating f unction M is therefore defined by 

M= fdf, 

Jo 

since the coefficient of h r /r[ in the expansion of this expression is fjf (x), the rth 
moment coefficient about the origin. 

Thus Af= r - *“"* <rt<r*a-h) |i + <^ + <££._,+ ...Id*, 

Jo(a-l)! ( a 2!«(a+l) ] 


a 2!«(a+l) 


a, — x 

Af = J (^j 2 e^ t e"~ xa "' h) la-i(2\/xt)dx. 


Let us now change the variable, writing a?(l — /&) = #' and at the same time 
writing t' for $/(l — h) in order to keep the series part in its present form. We 
then have 

| gV) (IV? 1 
<1 —*>•/. (o-l)! 1 1+ a + 21«(a + l) + '")“ 

” (1 - h) a | 0 (?) e ~*~ l /fl - 1 < 2 ^ O 

e th/l-h 

.< 28 )’ 

since the integral is now of the same form as (27). 








John Wishaet 


465 


Further, if we write 

A* A® 

K = log M = x*A + *2 gj + *a g-j 4*...> ^ 

on expanding in powers of A, then # r is the rth semi-invariant of the distribution 
of a. K\ is equal to yi , while 

*2 = Ma, * 8 = 

= yi — Syf =* y* 2 (fa — 3), 

and so on. 


We have therefore only to take the logarithm of (28) and find the term in h r /r l 
in its expansion. 


Now 


f K=^ h -al°g(l-h), 


and the term in A r /r*! in this is simply 

* r (*)«(r* + a)(r-l)! ...(29). 

This is the general rth semi-invariant of the distribution (27), and is a generalisa¬ 
tion of the Type III result a(r — 1)! which holds when t is zero, as is otherwise 
evident from inspection of (27)*. 


In particular the mean value of x , i.e. of Lt (bE % \ is obtained from Kr(x) by 

b -►oo 

putting r = l, and is a + t (see (25)), while the second moment coefficient, <r a 
(identical with tc 2 (#))> is equal to a 4- 2t (see (26)). This establishes the relations 
sought. 


We have noted already that Fisher's (.4) distribution tends, like ( C) t to the 
common limit (B) as n 2 increases without limit. We are not concerned here with 
the properties of the (.4) distribution, but it is perhaps relevant to show how the 
results of the previous paperf on the mean and variance of the (A) distribution 
also tend, in the limit, to the values (25) and (26) just deduced. In terms of the 
correlation ratio we have 


J£2 = + , 

a + 6 ci + 6+1 


IV T 1/ iv T V T 

from equation (14) of the former paper. 


Now write brf = «t in our notation, and we have 

¥i =z { 1 + t) + s( 1 +£ f 1 ) { 1 + P +0 ©} 

a 4* £ a® 4- (a 4-1) £ ^ /1 \ 

--.,. 

Hence Lt bE % * a 4- t 


,(30). 


# We have here another eaae pf a Bessel Function distribution, the law lor the semi-invariants of 
which is even simpler than that of McKay, given in Biometrika , Vol. xxiv. 1982, pp. 89—44. 
t J. Wishart: Biometrika % Vol. xxn. 1981, pp. 858—861. 

29—2 





456 


A Note an the Distribution of the Correlation Ratio 


Further, from equation (20) of the former paper we have 
t *~. = F(2, 2,a + b + 2, 


a * (a+ 6) (a+ 6 + 1)" 

- (> + 1) (‘ - s)' {' + tf i 1 + -t T {> + t- +1 0 ©} -<i - w 


, 2(a + «) . 3a* + a + 4a« + 4t + t* , 

= i— F -+- & +u u»r (1 - j6) - 

<)■ 


(1 — l? 2 ) 2 is obtained from (30), and we have 

(i 4* 2£ 


O'V s 


and Lt = a + 2£. 

b-+-co 

In both cases the results are identical with those already deduced from the (C) 
distribution. 


Summary . 

Beginning with a statement of the nature of the distributions, for three distinct 
cases, of the square of the multiple correlation ratio in samples from a normal 
population, the paper goes on to consider in detail the third of these, namely that 
appropriate to the case where the array totals are supposed the same for all arrays. 
Expressions are reached for the probability integral of the distribution, and for the 
mean value and variance of the square of the sample correlation ratio. It is shown 
finally that the mean and variance tend, in common with the analogous results for 
the other general distribution previously studied, to the corresponding parameters 
of Fisher’s limiting distribution (J9), as the size of the sample is increased without 
limit, and the general semi-invariant of this limiting distribution is given. 




ON THE PROBABILITY THAT TWO INDEPENDENT 
DISTRIBUTIONS OF FREQUENCY ARE REALLY 
SAMPLES FROM THE SAME PARENT 
POPULATION. 


By KARL PEARSON. 


1. Let there be ^ categories in either sample and suppose that the parent 
population has the sanie v categories, and that the chance of drawing an individual 
from the sth category of the parent population isp„ where s may be 1, 2, 3... v. 
Let the category contents of the two samples of sizes N and N' be respectively 


and 


«i. n t , ... n„ ... 
ni, vi, « 8 ',...»/, 


Then I proved in a paper published in 1911 *, that if 

, *=” NN' (n, n/\ 2 1 

* “ N') P> 

the frequency distribution of y* would be given by 


(i). 


y = y«e- ixJ (k a ) i(,, ' 3) ld(M)] .(ii), 

and P, the probability of y 2 not falling short of a given value, would be found by 
entering the (y*, P) tables under that value of y 2 and with n' — v. 

If the parent population be known, or we are testing whether the two samples 
are likely to have been drawn from a hypothetical population, the problem is 
perfectly straightforward, because the series p, will be given; and the answer may 
be found by fairly easy arithmetical work. When, however, the parent population 
is unknown, and the question put to us is—Are the two samples likely to have been 
drawn from some unspecified parent population?—the answer is not so easy to 
provide. 


Unfortunately the answer I gave in 1911 was not the correct one. I wrote: 

“ Now the best hypothesis as to the constitution of this [the parent] population, 
on the assumption that both frequencies are random samples of it, will be that its 
sth frequency class is that indicated by the combined two samples f.” 

In other words, I suggest that p, should be taken equal to (n s + n,')l(N+N'). 
This was not a bad suggestion J, if the samples were considerable in ske, but it is in 
no way the “best” hypothesis. If we adopt it, then 

* ifxf/ 8 - - - r^~ .(ill). 

* NN tm i n, -t- n. 


* Biomtrika, Vol. vm. pp. 260—254. t loc. cit . p. 262. 

$ This use of the sample values for the unknown parent population values is so usual m the theory 
of errors that it frequently escapes fomment; it is really only legitimate in the oases of large samples; 






458 


Have two Samples the same Parent Population? 


and the table of the two samples can be arranged as a biserial contingency table, 
and it has unfortunately come to be spoken of as such. The true form (i) cannot 
be represented as a contingency table, and (iii) will, as a rule, not give a con¬ 
tingency table for any other pair of samples which help to make up the complement 
of ^s involved in P. It has led to many students forgetting what the » # + n/ 
stands for, i.e. an approximation more or less adequate for the unknown (N + A?')Pi. 

If we try to think over what the words “best hypothesis” mean in this matter, 
ought we not to interpret them as signifying that hypothesis as to the jp/s which 
will give the highest probability of the two samples being draw from the same 
population? Surely, if we are asking whether the two samples are likely to have 
been drawn from some unknown parent population, we ought'to choose for that 
unknown population the one that makes the probability P of their common 
sampledom a maximum, or the value of as small as possible. 

Now it is quite possible to determine this system of p/s. Of course when found 
they may contradict some other experience we have had. But here in this problem 
we are supposed to start with no past experience, i.e. with quite unknown p 9 s. Had 
we some previous experience of the latter we should have to discover not “the 
most likely” but what I have termed “the most reasonable” values of the p t 
series*. 


Proceeding to the determination of the most likely values of the p t series, we 
have to make 

* ,-iU N') p,’ 

where v** NN'/(N + N') a minimum, subject to the conditions that: 


(a) 


S(p.)=l, 




and 

( b ) the values obtained for the p 9 s are possible as probabilities, i.e. they must 
all be positive and less than unity. 

Following the usual method with an indeterminate multiplier X: • 

n *7? (n* n»'\* 1 * 

0_ "7?,U N’J P * Sp,> 


0 = '8 (Bp.), 


* = 1 


or we obtain the series of equations 

A* 8+x=0 - 


Multiply by p, and sum, we find 


X S ml». + X = 0. 


* On the determination of the * 1 most likely” and * ‘most reasonable” values of the oonstanis of a 
parent population see Tablet for Statisticians and Biometricians, Bart II, p'p, olxxi it seq. 



Karl Pearson 


469 


Hence 

P< tfW U tfV • 

Taking the square root of this we have a plus and minus root, and by the nature 
of p 9 we must take the positive result. In other words 

/IT ( n * n *'\ 

P *”V x » mln . 

the quantity in brackets being given the positive sign. 

Sum the result just obtained and we have 


(n, n.'\ 

\N~ N') 

P "W^) 


These values satisfy all the requirements of the problem. p 9 is always positive and 

8~V 

less than unity, and S (p g ) = 1. Further, the ^ obtained is a minimum and not a 

*=i 

maximum because we can choose the p g series so that can be as large as we 
please. 

Accordingly we have chosen a parent population which gives us the best chance 
that the two samples are drawn from the same parent population. 

We will now illustrato this numerically. 

2. The following data have been extracted by R. A. Fisher * from Tochers 
Scottish returns for the children of a certain locality: 

TABLE I. 

Hair Colour . 



Bed 

Medium 

Dark 

Jet Black 

Totals 

119 

849 

504 

36 

2100 

97 

677 

461 

14 

1783 

216 

1526 

956 

60 

3883 



StaHstioal Methods for Research Workers , 8rd ed. p. 85. 















460 


Have two Samples the same Parent Population? 

Here N = 2100, N' = 1783 and the proportional frequencies are: 



Fair 

Bed 

Medium 

Dark 

Jet Blaok 

Totals 

Boys (1) 

•281,9048 

•056,6667 

•404,2857 


•017,1428 

1-000,0000 

Qiris ($) 

•305,1038 

•054,4027 

•379,6971 

•252,9445 


1*000,0000 

n t nj 
N~ N' 

; 

- *023,1990 

•002,2640 

•024,5886 

- *012,9445 

•009,2909 

— 


Further: NN'/(N + j\T)» v- 964*2802. 

Everything is now ready for substitution in (i), when we have selected our p t series. 
Fisher takes for his p 8 series the values (n B -f r*/)/(iV r 4- N'), which are obtained by 
dividing the third row of figures by its total 3883. Let these be termed p/s. We have 
p/ = *292,5573, p 2 '«*055,6271, pi = *392,9951, p/-’245,9439, p/-012,8766 

.(vi). 

If now we square the last line of the table above, divide each square by its appropriate 
pi from (vi), add and multiply the result by v, we find 

10*4674. 

Fisher gives practically an equivalent 10*468. By our method we have five 
categories—no question arises of degrees of freedom—and we look out the (^, P) 
table under 10*4674 and n' = 5 and find P = *034. This agrees with Fisher 
who says the value of P lies between *02 and *05. He concludes “that the sex 
difference in the classification by hair colours is probably significant as judged by 
this district alone/’ 

We ask, however, whether another parent population could not be found con* 
tradicting this result. 

Turning back to the table at the top of the page we add without regard to sign 
its third row of figures and find for its total 

Hs-f)- 072 ’ 2870 - 

whence by (v) we have 

pi = -320,9291, p, =-031,3196, p s » -340,1524, p 4 = *179,0709, p* = 128,6280 

.(vii). 

Now here are another series of p* s, which will not like the p/s in (vi) lie between 
the values found for the two samples. But have we any reason to suppose the 
parent population must have relative frequencies lying between those of the two 
samples? All we can say, if we are in complete ignorance of the parent population, 
and are determining “ the most likely ” values of its proportional frequencies, that we 
ought to use (vii) rather than (vi). We can either use (i) as we have already squared 
the items of the third row of the table above, or more briefly use (iv), we have 

964*2802 (*072,2870)* « 5*0388, 















Karl Pearson 


461 


a value less than half that given when we use the combined samples' value 

( n t 4- n t ')/(N + N ') 

to determine p t . This value of x* leads to P« *284, from which we should conclude 
that it would be quite reasonable to suppose no sexual difference in regard to hair 
colour of boys and girls in this particular locality. 

This illustration is not given to prove that there is no sexual difference in hair 
colour, but as a warning against placing too great a trust in the method of 
estimating the difference of two samples by the spurious contingency method, i.e. 
the method which without thinking of the parent population, and the nature of the 
assumption p 9 = (n 9 + n a ')/(N + N') uses (iii) as an invariably safe test. There are 
clearly numerous paiClt populations between (vi) and (vii) which would admit of the 
two samples being reasonably considered to have arisen from the same population. 

3. We can extend the conception of the first section of this paper to a more 
general problem. Suppose we have a population involving two characteristics A and 
B classified into u and v categories respectively, and let the chance of an individual 
being drawn from the *th category of A be p 9 and from the £th category of B be 
q t . Further, let the chance of an individual being drawn combining both character¬ 
istics be a 9tt where a 9i will not be equal to p 8 xq t unless the characteristics are 
independent. Now we can represent this population in the form of the table 




Bt 

... 

B t 

... 

B v 



<*11, 

<* 12 , 


<*1<, 


<*lv> 

Pi 

At 

<*21, 

<*22, 


« 2 <, 


<*2v, 

Pt 

A . 

«.i, ; 

<**«, 


<*«, 


<*«», 

P> 

A u 

<*ul, 

«ul, 


<*u«, 

i 


<*ut,» 

Pu 


9i 

qt 

... 

qt 

i 

*• 

1 


Now suppose a sample of size N be drawn from the above population as parent 
population, and let the distribution in the u x v cells be represented by the scheme 
below: 


wu 

Tils 


n u 


Wit, 


7121 

n n 


n%t 


n 2v 

Wf. 

71,1 

n ,s 


n*t 


n if> 

W,. 

n vl 

nut 


n u t 


n uv 

Mu. 

n. i 

j w.t 




n v 

N 












462 Have two Samples the same Parent Population? 


.(viiiX 


Then the mean square contingency of such a sample is defined as 

Similarly, if % a be taken as Nft, we have 

* +s -l!{w) . 

Now if there be no correlation between the variates in the parent population, i.e. 

p*x qt> then on certain assumptions with regard to the size of n itf x* and 
therefore <f> 2 /N as thus defined will be distributed according to the law 

y = y 0 e-4x s (j x *)4(>«’-8) .(ix), 

for the being independent, it does not matter whether we arrange them in the 
form of the above table, or in a single row or column. 

Now, suppose we have no knowledge whatever of the parent population, then 
what are the best values to give the unknown a rt 's? 

J» = U t ~ t* 

Clearly the only relation binding on their choice is S S (a#) = 1. 

«=i t-\ 

Let us find the minimum value of ^ subject to this condition; we have dropping 
the double summation sign for brevity 




0 = (S’ (8a,,). 

Using an indeterminate multiplier X, we have 




■f X = 0, 


for all values of s and t 

Multiplying by a 8t and summing 

s {m.) =x ’ or X =x a ^- +N - 


Further: 
and accordingly: 


a 8t Z 


n 9t 


n 9t 


or 


8 (l£) + N = X*-.-. + N ’ 

= and a, t = 


rijt 

"N > 




« / \ 




an^ — - 




from which 





Karl Pearson 


463 


Thus the “ best * form to give the parent population margins, i.e. that which makes 
a minimum for this sample, are frequencies proportional to those of the sample 
itself. In this case 

N ) 
n*JKt 
N 

Now in order that (ix) may hold, the a a $ series must be supposed constant through¬ 
out the sampling, i.e. we are to suppose n a , and n, t remain the same for all further 
samples and only n 9t to vary. Thus the successive samples cannot be arranged as 
contingency tables. For if we change n a , and n %t with each sample we are making 
a new parent populate *n with each sample, and the samples cannot then be supposed 
drawn from the same parent population*. 

We now reach a case of which much use has been made, but which I think 
needs very careful handling. 

As in the previous section we have 

N(!+<!>') = ■)?+N = 8 (|?£).(xi) 

and we desire to determine whether there is independence in certain results, 
which are assumed (as hypothesis) to be sampled from a population of zero 
contingency. 

We can best illustrate this by an example given by Dr R. A. Fisherf, dealing 
with Wachter s data for back-crosses in mice. He gives a fourfold table running 
as follows: 

TABLE II. 



4 * 


1 ( n “ 

,«*, ±s- _ 

’ N* 



Black Self 

Black Piebald 

Brown Self 

Brown Piebald 

Totals 

Coupling • 

\Fi Males 

88 

82 

75 

00 

305 

F x Females 

38 

34 

30 

21 

123 

Repulsion - 

*F X Males 

115 

93 

80 

130 

418 

F x Females 

90 

88 

95 

79 

358 

Totals 

337 

297 

280 

290 

1204 


Now neglecting the marginal totals the problem seems to be: Could the 16 cell 
frequencies have arisen from sampling a parent population with no contingency?— 
Thus a $ t**p a x q t . But we have taken of the four categories four samples of the 
sizes 306,123, 418, 368. Are we going to confine our attention to such distributions 

* There is nothing to prevent 0 3 with n t . and n. t varying from sample to Bample being used as a 
statistical coefficient, but in that case x* is not N<p?> if we mean by x* that whioh is distributed by the 
law of equations (ii) or (ix). Result (x) seems to justify the usual expression for <p* as a measure of the 
departure from independence; (n ll =n l .ti.,/iV, or x 3 ^ =0), when we have no knowledge of the parent 
population. 

t Statistical Method* for Research Workers , 8rd ed. p. 88. 






464 


Have two Samples the same Parent Population? 


as arise when we repeatedly make samples of these same sizes? Apparently we are 
to do so and this though somewhat artificial could be carried out, if with difficulty* 
We thus reach what I have termed a coefficient of partial contingency* with four 
linear equations of condition among the n^’s. This will reduce the n r of our (j^ 2 , P) 
table by three, as that table takes account of the total size of the sample 1204. So 
far so good. But now we come to the horizontal marginal totals. These must vary 

337 297 

from experiment to experiment; what reason is there for treating , j 204 » 


280 A 290 
1204 aDd 1204 


as the values of qi,q%, q a and q t4 in the hypothetical parent popula¬ 


tion of no contingency from which we suppose our four samples extracted? Out of 
all possible repetitions of the four series of crossings, those giving the horizontal 
marginal totals coinciding with those of these actually performed experiments in 
the several categories of mice must be of the highest rarity and we should find it 
practically impossible to obtain such sets. It would seem that in choosing the 
horizontal marginal totals as the values of the qt s we are really repeating what was 
done in the biserial table, i.e. assuming that as we do not know the q t 's we shall do 
the “best" we can by supposing they agree with those of the observed frequencies. 
If we assume that all further experiments are to give these same marginal fre¬ 
quencies, we again limit by three more linear relations our contingency, or, in 
looking up P from we must reduce ri by six, or enter the table with ri »10. 
The problem we are then answering is this: If we made further quadruple experi¬ 
ments each having the same number of mice from each form of crossing, and each 
quadruple experiment giving precisely the same numbers of Black Self, Black 
Piebald, Brown Self, and Brown Piebald mice, in how many cases might (on the 
basis of independence) at least as great a value of x 2 expected? But this 
further limitation is unnecessary, if all we have assumed is a system of likely values 
for the qtS f and suppose successive further samples not to give the same horizontal 
marginal totals. Dr Fisher appears to prefer the extreme limitation to 9 degrees 
of freedom. This forces his further samples into a partial contingency table form, 
with all the marginal frequencies identical with those of the actual sample. 


Not only will the % 2 and therefore the P, as I have shewnf, be dependent on 
the number of mice resulting from each cross, for example, be altered if we had 209 
instead of 418 “Repulsion, Fi Males," but in practice the repetition of the coat 
colour distribution in further quadruple experiments is unattainable. It seems 
awkward in applied science to say something will occur, if so and so be done, when 
the doing of the latter is practically impossible. 


Some, if not all, the difficulty may be surmounted, if we turn back to our value 
of namely: 


* “ On the General Theory of Multiple Contingency, with Special Reference to Partial Contingency/* 
Biometrika , Vol. xi, pp. 145—158; see in particular p. 146. 
t Biometrika, Vol. xxiv. pp. 302-808, footnote. 



Karl Pearson 


465 


since the parent population is assumed by hypothesis to have independence—and 
ask, supposing the p 9 *s to be fixed, what are the “best” values to give to the qi s on 
the basis of the observed results?—Our answer is as before that the greatest 
probability for the observed results on the basis of an independent parent population 
will be obtained by choosing the g/s so as to make a minimum. 

We proceed to make 

*-??(*&)-* ...<*“> 

a minimum subject to the condition that 

3( 3 i) = 1. 

t 

We have with \ 9 indeterminate multiplier 

s (w~) 

* \NpJ q t 2 

or multiplying up by q f and summing for t: 

X 2 min. + N = & (Mt) ** 


Hence: 


!($ 


and it follows on substituting for q t in (xii) that 

. (xiu) ' 


and that 


or q t is always < 1 and 8 ( q t ) = 1 . 


iM 

i?($ 


Applying these results to our example, we proceed first to square all the terms 
n 9t and take the reciprocals of the p 9 s. Thus we find proceeding down a column 

S and then take the roots of these expressions. We obtain: 

Jb (y) = 337 3310, S (^pj = 298-0192, 


and accordingly: 




' 282-4913, \/S | 


■ 296-9776, 


V'?(£) 


: 1214-8191, 




=-277,6800, -245,3198, q 9 - -232,5378, q 9 - 244,4624. 







466 


Have two Samples the same Parent Population? 

Dr Fisher obtains for the value 

X o a = 21*832, 

working his table as an ordinary contingency table and his g/is, which are his 
horizontal marginal totals divided by his iV(= 1204), will be 

?i = *279,9003, q 2 =-246,6777, q z = -232,5582, q € - -240,8638. 

The reader will say: “ It is true your is less than Dr Fisher's, but your qj& are 
so close to his and both your differ only by an insignificant difference, that it 
is not clear why an attempt should be made to improve on them*.” But there is 
really a wide difference between the two methods of approach! Suppose we knew or 
guessed the p t ' s and q t \ s of the parent population. Then we should use Equation (xii) 
to compute and we should enter the (j£ a , P) table with ri = 16, because there 
would not (beside the size of the sample) be any restrictions whatever on the 
freedom of our sample. By fixing the p/s, because there is no “natural” size for 
the relative numbers of matings we may make artificially, we have reduced our 
conclusions, whatever they may be, to apply only to a repetition of experiments of 
these sizes. We have destroyed the possibility of a general law; we cannot assert 
that for other sizes of samples, we should deduce the same conclusion. We have 
reached a conclusion for a narrower universe by sacrificing three degrees of freedom. 

But at any rate let us attempt to reach a conclusion for a broader universe by not 
sacrificing further degrees of freedom! If we make the coat-colour distributions to 
be the same for every set of quadruple types of matings, our conclusions will apply 
only to an absolutely restricted and practically irreproducible universe! But how 
can we avoid this? Only by assuming some set of q t ’s for the hypothetical parent 
population. How may we do this? There are two obvious ways: (a) assume that 
the experiments give a good approximation to the required q t * s, or (6) find the 
most likely g/s by the method of this paper, i.e. those which make the probability 
of the observed result a maximum. In either case we do not further restrict the 
degrees of freedom. Further quadruple samples will not have the horizontal marginal 
totals the same as those of the observed sample—i.e. will not take the form of con¬ 
tingency tables. What then has Dr Fisher done, when he reduces his degreos of 
freedom by still a further three? He has picked out of all the possible samples 
those which have their distributions the same for the coat colours. His conclusions 
therefore only hold for that extraordinarily limited universe. 

It is of interest to note that in this particular case—it is far from being so in 
every case—the observed values of the coat-colour distribution are strikingly like 
the “best” values and lead to the same value practically of x 1 *. 

If we took out the P that corresponds to that with ri = 16 — 3« 13, we find 
from (a) P® 040, and from (b ) P = *041. 

If we limit our ri to 10 = 16 — 6, we find P= *010. Now what does this signify? 
It denotes that With the narrower proposition when we experiment in such a manner 

* The reason for the closeness of the two sets of is that our sample being large the horizontal 
margin total distribution gives nearly the same series of q t *s as the set whioh produces the minimum 
value of 



Karl Pearson 


467 


as always to get the same relative numbers of coat colours(!) we may predict that 
the four series (P * *01) are not homogeneous, whereas we are far less certain of this 
(P**04) when we take experiments which could be fairly easily repeated* But in 
the former case we do not know whether the departure from independence really 
lies in genetic conditions, or is due to the restraints which have been put on the 
distribution of coat colour. The effect of abolishing these restraints appears at least 
to suggest that they have contributed to the result. 

Of course the desirable thing would be to abolish all restraints except the total 
size of the sample, but this is impossible with regard to the pi s, for their arbitrary 
character lies in the very nature of the experiment. To show to what extent the 
arbitrary choice of pi s effects our conclusions, I will take the following table, which 
is obtained pracfciqf'Jy by doubling the number of “Coupling, Ft Males” and 
halving the number of “Repulsion, Fi Males.” There appears to be nothing more 
arbitrary in this than in the results of the observed mating type proportions. 


TABLE III. 



Blade Self 

Black Piebald 

Brown Self 

Brown Piebald 

Totals 

Coupling Females 

176 

164 

if>0 

120 

610 

38 

34 

30 

21 

123 

Repulsion Females 

58 

46 

40 

65 

209 

96 

88 

95 

79 

358 

Totals 

368 

332 

315 

285 

1300 


Following Dr Fisher, that is making the relative coat-colour distribution con¬ 
strained, we obtain: 

X 2 * 16*2710 and P - *062. 

Thus even with the P* *05 limit, we could not now assert that the observed departures 
from independence are not of a magnitude ascribable to chance. 

This illustration points only too strongly to the caution requisite in applying 
this method to draw conclusions from observed data; the conclusion drawn will 
depend on the number of mice, and therefore on the relative number of crossings 
made in each one of the four sections of the quadruple experiment, and these are 
at the choice of the experimenter. 

If we do not limit our judgment to experiments giving always the same relative 
proportions of coat colour, then we must enter with ri * 13, instead of 10. If we 
take ql& of the parent population to be those given by the observed values, i.e. 

ft * *283,0769, ft - *253,3846, ft **242,3077, ft - *291,2308, 
we have x** 8 16*2710, and P** 180, a value which quite prohibits our concluding 
that the departures from independence are not ascribable to chance. If we use the 
“best” values of the ft's, they are: 

ft* *281,5800, ft* *254,4918, ft* *241,9760, ft **221,9522, 
again unusually close to the observed values. 




468 Have two Samples the same Parent Population? 

We then have for the “best” x 2 : 

^ mhu = 16*2114, 

giving P * *182, which is practically the same as we find from using the observed 
q t f s to represent the parent population. 

I trust this discussion has to some extent cleared up the difficulties which await 
those who use a contingency table of multiple rows to question whether the multiple 
series involved in those rows may be treated as homogeneous, i.e. possible samples 
from a common parent population. We have seen that in Dr Fisher's approach to the 
problem two difficulties arise. The first from the arbitrary numbers of mice in each 
experimental series, and the second from the constraints enforced on the coat colours. 
Dr Fisher is really proposing a seriesof experiments, each individual experiment giving 
the same numbers of mice from each type of mating as occur in the observed experi¬ 
ment,—this maybe needful,—and further the same relative proportions of coat colours. 
The latter is not needful, and would be practically impossible to achieve. Dr Fisher 
concludes that if he could repeat the multiple experiment under these conditions 
the x 2 would correspond to a low probability, but there is no evidence that this 
result flows from the genetic constitution of the mice. Indeed, if we alter the 
numbers of mice from each type of mating, i.e. alter the number of matings, and 
leave in our experiments the distribution of coat colours to freely adjust themselves, 
we find x 2 can be 80 modified as to provide a probability, which is far from suggest¬ 
ing that the departures from independence are not of a magnitude to be ascribed 
to chance. The method therefore needs great caution in use, and there should always 
be an exact statement of what the problem supposed to be answered really is. 

It will be seen that the method of the contingency table fails in stating clearly 
what is the homogeneous population from which we are supposing the four series 
to be drawn, and, admitting the difficulty of the problem, I prefer to attack it by 
using (i) and comparing each pair of series with one another. The question is: 
What value shall we give to the p 8 s for each pair? It seems very much better not 
to use the observed sum of the columns, but to adopt for the value given in (v) 
and accordingly for its minimum value in (iv). If we use these we have six pairs 
of series to compare, and owing to the simplicity of (iv), the arithmetical work is no 
more laborious than that of finding x 2 from a 4 x 4 table. 

We take the reciprocals of the marginal totals column and by aid of them 
determine the relative frequencies of each row. Thus we get the table: 


TABLE IV. 


SerieB 

nJN 

n,/N 

n,IN 

nJN 

N 

A 

•288,5246 

•268,8525 

*245,9017 

•196,7212 

305 

B 

•308,9431 

*276,4228 

•243,9024 

•170,7317 

123 

C 

•275,1196 

•222,4880 

•191,3876 

•311,0048 

418 

D 

•268,1564 

•245,8101 

*265,3631 

•220,6704 

358 










Karl Pearson 


469 


We now take the differences between the entries in each pair of rows, add these 
differences and squaring their sum multiply by the corresponding value of 
v=*NN'/(N + N') t where N and N f are given in the last column. 

For example, taking A and B, the differences, regardless of sign, are 
*020,4185, -007,5703, 001,9993, -025,9895, 

giving the sum •055,9776, and its square -0031,3349, 

v * 305 x 123/(305 + 123)« 87*6519, 
and thus ** * 87 6519 x *0031,3349 - -2747. 

The P of course is to be looked up under the number of categories, i.e. n '« 4. 

Proceeding in thi% #py, we find 


A and B: 

X 2 - 0-2747, 

P--946' 

A and G: 

X*» 9-2122, 

P = -027 

A and D: 

X 2 = 1-2414, 

P = "746 

B and G: 

7-4791, 

P « 059 | 

B and D : 

X 2 = 18668, 

CO 

o 

cp 

II 

fti 

C and D : 

X 2 * 7-3023, 

P - -064 J 


By the P = *02 criterion, none of these are significant; by the P = *05, the A 
and C differences are. But we see at once that while the series A , B and D 
might be considered as samples from the same population, the probability that 
any one of them and C can be considered as such is of a much lower order. 
Accordingly we take the sum of A, B and D and test it against C. Thus we have 
the relative frequencies: 


SerieB 

nJN 



"4 !» 

N 

A + B + D 

•282,4427 

•259,5420 

•254,4529 

•203,5624 

786 

C 

•275,1196 

•222,4880 

•191,3876 

•311,0048 

418 


giving the differences regardless of sign: 

*007,3231 *037,0540 *OG3,0653 *107,4424 

with a total of *214,8848 and square 0461,7548, this with = 272*8804 leads to 
= 12*6004 and P = *006. 

The advantages of this process are that Table IV enables us to make any 
analysis we please of the material as we proceed, and that while (xv) is not 
definite, it indicates the line on which we can get a perfectly definite result. 
Finally we are certain that whatever parent population we may take for a pair, 
that chosen is the one which makes the observed result most probable; in other 
words if definite heterogeneity may be predicted on the result thus obtained, it 
would certainly be predicted on any other assumption of a parental population 
distribution. We see that if A + B + D and C be supposed to be two samples 
from the same parent population at a maximum whatever that population might 
be, two such samples could not arise more than 6 times in 1000 trials. 

Biometrika xxiv 90 






470 Ha/ve two Samples the same Parent Population f 

The method is straightforward, the arithmetic simple, and we take P out of our 
(x* P) table with n' equal to the number of categories in the series. 

This method of approaching the problem is of course not free, any more than 
the contingency table process, from the variation in —and therefore in P —pro¬ 
duced by the artificial choice of numbers of matings and the resulting numbers 
of offspring. This difficulty is introduced by the factor v**N2F/(N , +N'). Supposing 
we keep the relative percentages of coat colour the same in the two series as well 
as the total number of mice, the maximum value of v> for A + .B +D as compared 
with (7, will be £ (N + N') 301, in our case leading to y* * 13*8988 and P * *003, 
which makes some difference in P, but not in our conclusion*. The relative size 
of the samples appears in such a simple form when we proceed from the biserial 
method, that it is fairly easy to appreciate their influence. Failing any "natural” 
distribution of the totals in the sub-experimentB, it would perhaps be the simplest 
rule for the research worker to keep them as near an equality as possible. 

* The ohoioe of the ratio N : N' may easily make a difference in oar oonelasions. Thus it in A we 
had had 6X4 mice and in C 200, giving the same total 723, we should have had r=148*68868 instead 
of 176*88472, and, if the colour percentages in each series had remained muoh the same, we should have 
X a== 7*7626 instead of 9*2122 and JP=*188 instead of the *027 of (xv). The number of mioe in the 
C group (Repulsion, Fj Males) is the largest of all four sub-experiments, and we must be very cautious 
to allow for this when, on the basis of the x 2 test, we attribute to C a genetic differentiation from A t 
B and D. 



CERTAIN GENERALIZATIONS IN THE ANALYSIS 
OF VARIANCE* 

By S. S. WILKS, Ph.D., National Research Fellow in Mathematics, 
Columbia University, New York. 

CONTENTS. 

* * PAGE 

1. Introductory Remarks.471 

2. Solutions of two integral Equations.473 

3. Generalization of the Variance of a Sample.476 

4. Moments and Distribution of the Ratio of independent generalized 

Variances.478 

5. Ratio of the independent Generalized Variance to any one of its principal 

Minors.480 

6. Generalization of the Correlation Ratio.482 

7. Generalization of 1 -17*.485 

8. Generalization of “ Student’s ” Ratio.487 

9. Generalization of the X-Criterion appropriate to k Samples . . . 488 

10. Moments and Distributions of Ratios of Determinants of Correlation 

Coefficients. 491 

1. Introductory Remarks . The theory of small samples has been developed to 
a large extent from problems involving a single variate. The extension of the theory 
to samples from multivariate populations has been made rather slowly and it is far 
from complete at present. It was not until 1928 that Wishartf found the simul¬ 
taneous sampling distribution of the variances and covariances in samples from 
a multivariate normal universe, whereas Fisher J solved the problem for a bivariate 
normal population in 1915. In the same paper, Fisher found the distribution of the 
correlation coefficient and in 1928 hesolved the correspondingproblemforthe multiple 
correlation coefficient§. The distribution introduced by “Student”|| in 1908 in his 
analysis of the ratio of the deviation of the mean of a sample from that of the 
population to the standard deviation of the sample was more rigorously obtained by 
Fisherf in 1925. At the same time Fisher extended its application to sampling 
fluctuations of regression coefficients, differences of means and other problems which 

* Presented to the American Mathematical Society , March 26,1982. 

t J. Wishart: “The generalized Product Moment Distribution in Samples from a normal multi¬ 
variate Population,” Biometrika , Vol. xx±, (1928), pp. 82—52. 

£ R. A. Fisher: “Frequency Distribution of the Values of the Correlation Coefficient in Samples 
from an indefinitely large Population,” Biometrika , Vol. x. (1915), pp. 507—521. 

§ R. A. Fisher: “The general sampling Distribution of the Multiple Correlation Coefficient.” Pro- 
eeedinge of the Boyal Society of London , Series A, Vol. cxxi. (1928), pp. 654—678. 

|| “Student”: “The probable Error of a Mean,” Biometrika, Vol. vi. (1908—1900), pp. 1—25. 

IF B. A. Fisher: “Applications of ‘Student’s’ distribution,” Metron , Vol. v. No. 8 (1925), pp. 90—104. 

80—2 








472 Certain Generalizations in the Analysis of Variance 

involve essentially a single variate. These ideas were generalized in 1931 by 
Hotelling* * * § who found the distribution of a quantity T which, when divided by the 
square root of the number of degrees of freedom, is a natural extension of “ Student's ** 
original z to a sample from a multivariate normal population. We find very few 
additional extensions of the kind with which w T e are concerned existing in the 
literature. 

Statistical coefficients which have not been adequately generalized for samples 
from a multivariate population include the variance, ratio of variances, correlation 
ratio, standard error of estimate when all variates are drawn at random, and certain 
maximum likelihood test criteria developed for one-variable problems by Pearson 
and Neymanf. As early as 1876 HelmertJ found the distribution of the sum of 
squares of deviations of a set of normally and independently distributed quantities 
from the population mean, and in 1900 Karl Pearson § solved the same problem for 
the case where there is correlation among the variates and found the distribution of 
X 2 - In 1908 “Student”11 suggested the form of the distribution of the sum of squares 
of the deviations of the variates of a sample from the sample mean, which was 
verified in 1915 by Fisher If. By means of the distribution of the ratio of two in¬ 
dependently distributed variances, Fisher** has found the distribution of the multiple 
correlation coefficient and the correlation ratio in samples from normal populations 
in which these quantities are zero. He has extended the use of this distribution to 
the problem of testing for the significance of variations in certain subvariances into 
which the variance of a sample can be analyzed and has developed the theory of 
intraclass correlations. Romano vsky ff introduced an extension of the ratio of variances 
and found the sampling distribution of a quantity H which is the average of the 
ratios of variances for two samples from a multivariate population. But this does 
not seem to be a perfectly natural extension of the variance problem for samples 
from multivariate populations as we shall see later. In 1928 E. S. Pearson and 

* H. Hotelling: 41 The Generalization of 'Student’s’ Ratio,” Annals of Mathematical Statistics , 
Vol. ii. (1981), pp. 859—878. 

t J. Neyman and E. S. Pearson: 44 On the Use and Interpretation of certain test Criteria for purposes 
of statistical Inference,” Biometrika t Parts i. and ii. Vol. xx A . pp. 175—240, 263—294. 

J. Neyman and E. S. Pearson: 44 On the Problem of k Samples,” Bulletin de VAcademic Polonaise des 
Sciences et des Lettres , Strie A, Sciences mathimatiques , 1981, pp. 460—481. 

X C. F. Helmert: “Uber die Wahrsoheinliohkeit der Potenzsummen der Beobachtungsfehler und 
iiber einige damit in Zusammenhang stehende Fragen,” Zeitschrift fiir Mathematik und Physik f Vol. xxi. 
(1876), pp. 192—219. 

§ K. Pearson; 44 On the Criterion that a given set of Deviations from the probable in the case of 
correlated Variables is such that can be reasonably supposed to have arisen from Random Sampling,” 
Philosophical Magazine , 5th series, Vol. l. (1900), pp. 167—175. 

|| 44 Student”: loc. cit . [Helmert also in 1876 proved the equation for the distribution of the sums of 
the squares of the deviations about the sample mean. See Astronomische Nachrichten , Bd. Lxxxvxn. 
S. 122, or Biometrika, Vol. xxm. pp. 416—418. Ed.] 

IT R. A. Fisher: Biometrika , Vol. x. (1915), p. 507. 

M R. A. Fisher: 44 On a Distribution yielding the Error Functions of several well-known Statistics,” 
Proceedings of the International Mathematical Congress , Toronto (1924), Vol. n. pp. 805—818. 

ft V. Romanovsky: 14 On the Criteria that two given Samples belong to the same Normal Popula¬ 
tion,” Metron t Vol. vn, No. 8 (1928), pp. 8—46. 



8. 8. Wilks 


478 


Neyman* began a series of papers in which they adopted the principle of maximum 
likelihood as a means of obtaining criteria for testing various hypotheses in statis¬ 
tical inference. Among others they have obtained criteria appropriate to the 
hypotheses that two or more samples are drawn from the same normal population; 
that a sample is drawn from a population with a specified mean; that two or more 
samples from populations having identical variances come from populations with 
identical means and a similar hypothesis stated by interchanging variances and 
means. They have thus far confined their work primarily to samples from normal 
populations of a single variable. 

Investigators dealing with samples of two or more correlated variables are 
confronted with the nped of extended forms of the above statistical mechanisms. 
For example, measurements of several anthropological characters are obtained on 
two or more groups of men; how can we test the hypothesis that they are from the 
same race by a consideration of their variances and covariances ? Similar problems 
arise in psychology concerning certain mental tendencies of two or more groups of 
individals who have been measured on the basis of several mental traits; and so on 
for other fields of statistical investigation. 

In this paper it is the purpose of the author to find the moments and distribu¬ 
tions of some of the foregoing statistical coefficients generalized for samples from 
a multivariate normal population and to exhibit a method of attack which seems 
to be novel in its application. Another problem which will be considered concerns 
the moments and distributions of the determinants and certain ratios of determinants 
of correlation coefficients in multivariate samples, from which a certain generaliza¬ 
tion of the multiple correlation coefficient is obtained 


2. Solutions of two integral Equations . The moments of the class of statistical 
coefficients which we shall consider are of a form which is a constant multiple of 
a ratio of products of gamma functions. Most of the distributions can be derived 
from the solutions of two types of integral equations. We shall desigrate these two 
types by (A) and (B) and find their solutions before considering the main part of 
the problem. 


Type A . The first to be considered is of the form 

fl* r ( a l + fc ) r ( tt 2 + A l).-.r(a n + k) m 

J. **/(«) *-•«*—fwr <a,).;:r <o.).. <A) ’ 

where k and the a*s are real and positive and B and f(z) are independent of k. 


By definition, T (a< + k) — [ d^^e^iddi. 

J o 

Hence, as far as the moments are concerned, the problem of finding f(z) is equivalent 

* J. Neyman and E. S. Pearson: Biometrika , Vol. xx±. Parts l and u. pp. 176—240. 

J. Neyman and E. S. Pearson: Bulletin de VAcadAmie Polonaiee dee Science t et dee Lettree, Sirie A , 
Seiencet mathSmatiquee , 1980 and 1981. 




474 Certain Generalizations in the Analysis oj Variance 


to that of finding the distribution of the product t—8 nt where has the 
distribution 


1 

r(di) 




(*■ 1,2,... n). 


Letting 0 n = -nw a --=— , and substituting in 

BO\V% ... P B _i 

t-1 1 


we have for the distribution of 2 , 


/« 


B-Onjgan-l f« Too fco 

r(ttx)r(a 2 )...r(a M ) Jo Jo”'Jo 


ga, -a*-] gOs-Ow-1 


gOw-i—Ow~l 


xe " ‘ .(1). 

By making the transformation di#* ... (t = l, 2, ... n — 1), we can write 

B~ a n 


f(z) = 


r(a 1 )r(a,)...r(tt„) 


n oo f oo 

I ^a,—a*—l —a*—1 ^o,|—i—On—1 

0 ‘"Jo 1 8 , *“ I 


—« r «-l £ 

xe "n-* Bvn -'dv\dvz..>dv n - 1 .(2). 

The author has succeeded in integrating this expression only for special values of 
the a*s and small values of w, which will be considered later. 


We note that the integral in (A) exists for all positive values of k and hence all 
functions satisfying the integral equation (A) must have their kth moments identical 
(&«0, 1, 2, ...). The uniqueness of the continuous solution (2) can be established 
by the use of SteklofTs* application of the theory of closure to the problem of 
moments. 

Since we are dealing with non-negative functions, it is to be noted that if we 
had not known the range of z in (A), it could be argued from SteklofTs theory that 
it must be from 0 to oo . This type of argument is especially important in establishing 
the range of statistical coefficients which we shall consider. 


Type B. Next we shall consider the integral equation 


r 

Jo 


w*g (w) dw ■ 


„r» T(bi + k)r(bt + k)...T(b n + k) 

0/5 r (d+ k) r (02 + k) ... f (c n + k) • :. 


where C « y” j and B and g (w) are independent of k\ where also 


the V s and c's are two sets of real and positive numbers such that there exists at 
least one way of pairing them such that each b is less than its corresponding c. Thus 
we assume without loss of generality that 6< < c*, (i — 1, 2, ... n). 


* W. Stekloff: “ Quelqaes applications nourelles de la thdorie de ferine tore an problems de repre¬ 
sentation approch^e des fonotions et an probleme des moments,” Mtmoire de VAcadSmie ImpdriaU dee 
Science* de St Piterebourg , Vol. xxxii. No. 4, 1914. 






S. S. Wilks 


476 


Let us multijr^and divide the expression on the right of (B) by II (c<—&<)• 
Then, since 




our problem is reso ed to that of finding the distribution of the quantity w= Bt\t% ...t n 
where the simulta|eou8 distribution of the t’s is given by 

IT r fa) 

iiir(i ( )r (d-bi)' 


■ U b <- X (1 - dti .(3). 


If we make the substitution 

[t •• 

in (3), we have 




w 

Btit% ... t n _i 


£J\.. \\j b r h '~ l - tr 1 *” 1 

x (1 - uyi-hi- 1 (1 - fc/r-V 1 ... (1 - (l - ^—— j dti... dt »-1 

. (*), 

T ~ W -, <o = l, (* = 1,2,... n — 1). 


where K**U 


r(6 i )r(^)’ Li ~ U i 

In order to make the limits of the integrations in (4) independent of the variables, 
we shall make the following transformation 


As the result, we obtain 


<,•-! (t “ 1,2 ’"' n 1)- 


*<•>- 




1 fl 


0 JO 


f 1 c n -1 

-Jo 1 8 -> 


x (1 — (1 — ... (1 — 

x ^l-i^l - ’|^l-{*»i + vt-(l-vi))(l-5)] ’ *••• 

x £l - {»i + .(1 - vi) +... + e n _i.(1 -®i).(1 - v t )...(1 - v-*)} (l “^)J 

x dvidvg... dv n _i ... 


where 


~ 2 c n .»j, * 2 b n ~j , (t *» 1,2,... w). 

jmQ ja 0 


( 6 ), 


w 

Since the distribution of w has the range 0 to B, we have, for B > w > 0,1 — g < 1. 


Then we can show by induction that 

{®I + Vf. (1 — Vi)+ (1 — »l). (1 — Vt) ... (1 — ^1 — ^ 

for 0<Vi<l, (t = l,2,...n— 1). Therefore, the series which results from expanding 
all of the factors in (6) involving the term (6) is uniformly convergent in the v’s over 
the field of integration and can be integrated term by term. This process yields the 
distribution function g(w) which, again, is unique. 






476 Certain Generalizations in the Analysis of Variance 


3. Generalisation of the Variance of a Sample. Wishart* >aa shown that the 
simultaneous distribution of the variances and covariances of a sample of N items 
from an n-variate normal population is given by 

Ual^ 1 -s j 

.jJli l' ~T.- e .. l«vl 2 da .(7), 

(w) * 


nr 
=1 




where | Ay | is 


Ly| is the nth order determinant of the elements • A# denotes 

the co-factor corresponding to in the determinant A =» | py | of population correla¬ 
tions and c Ti the standard deviation in the population of the ith variate. Thus, 
if \«r is defined as the generalized variance of thf population, then 

| Aij\ = W M [2 n X]. da is the product of the differentials of a l of the as. The 
elements of the determinant | | are the variances and covariances from the 

sample defined as 

1 N 

Q'ji 85 2 (fCia *^t) {fiji J “ 1>2,*.. w), 


_ 1 


where 2 x\ a is the sample mean of the ith variate, and x ix is the value of 

the ith variate a? t *, for the ath individual. 

We shall adopt as the generalized sample variance the determinant |o#|. This 
quantity for w-variate samples and the ordinary variance for samples of one variate 
are similar, not only in the manner in which they enter the distribution of their 
component parts (there being only one part in the case of single variate samples), 
but in the way they arise in maximizing likelihood functions'}*. For example, the 

maximum of the likelihood expressed by (7) for variations of the population 

__ w + l 

parameters A^ is C\aij\ 2 , and if the Z-function of the means is taken into 

account, the maximum of the joint Z-function is proportional to |ag| 2 . In one- 
variable samples, the maximum Z-functions for the two cases are proportional to 
a~ x and a~l respectively, where a is the ordinary variance. 

Let us denote | | by and proceed to find its 4th moment Af*(f). From the 

fact that the integral of (7) over the field of possible values of the a’s is unity, we 
have (using abbreviated notation) 

rtUtzhn __ (N - i\ 


f. 


" JV—n—2 

K | -T- 


7r 


da * 


n r 

i=i 


(-r) 


ITT 


.( 8 ). 


M«l 2 

* J. Wiihart: loo . cit. 

f In this paper we are primarily interested in various functions of the means, variances and 
covariances of a sample from a normal population. For this reason we shall express the probability of 
a sample in term? of the probability function F of its means, variances and covariances rather than in 
terms of the probabilities of the individual items of the sample. F is, of course, a function of the 
means, variances and covariances of both the sample and the population and is the product of two 
independent functions F a and F mt where F a is the distribution function of the variances and oovarianoes 
and F m is that of the means. For a specified sample (i) F, (ii) F a and (iii) F m may be considered as 
functions of population parameters, and will be called livelihood functions or simply L-functions of 
(i) the sample, (ii) the variances and covariances, and (iii) the means. 





S. 8. Wilks 


477 


1 f _ $ , _ , N+U-n-2 

Then Jf* (g) - i •* <* * | a# | 2 da ...(9), 

where G is the expression on the right of (8). We have at once the value of the 
integral in (9) by substituting N+2k for N in the right side of (8). Therefore 

"*"* r (nr + *) r (—Jr ■+*)••• r ft- +*) 

r ( A r) r ft- a ) •" r r) "' <,0> ' 

' jyn 

where A = 1 Au I«■ — s -rr- 

1 ” 2” <rj* o’*®... <r n * A 

The fact that a faqf >r N is concealed in each Ay does not invalidate (10), because (8) 
holds for all positive values of <Ti and hence it holds when <r< is replaced by 

/F+a 

Ti v JV • 

Clearly, this process will absorb the increment 2 k in the N multiplier of each Ay 
and will not affect the increment of N entering at any other place in (8). The same 
result can be achieved by transforming the A 's and a*s by letting A^^NEy and 

a ij = bij/N in (8), and finding the kth moment of * li which is easily found to be 

JT*(0i 

If we denote the distribution of £ by D (£), we must have 

\^D(g)dg=M k {g) .(11), 

an integral equation of type (A). 

Therefore 

N-n N z n-2, 

A 2 jt 52 fooroo roo 

/)(£)«--TTv—r ... (tfctfc... Vn-i)~*e "*-'d Vl dv 2 ...dv n -i 

tt n / JO JO J <1 

"/VT-j .(12), 

and the range of £ is from 0 to qo . 

If n * 1, we get the well-known distribution of the variance in samples of a single 
variate 

jv- i 

(JL\ 2 

\2oV £=* 

A(P- * e .(12 a). 

r \~2 ) 

For n = 2* 

__ JV-8 JV—4 „ - JV—8 JV—4 

.VerA * t * ,_ 2 J * m 

r A, -~ rvhi —. 

* If »i and * a war* two sample standard deviations and r u tbe correlation coefficient, then in this 
case 








47-8 Certain Generalizations in the Analysis of Variance 


where 


A t = ■ 


N* 

4<7i s a | S (1 - pu*) ‘ 


The author has thus far been unable to obtain D (£) explicitly for larger values 
of n. 

In practical work it may be desirable to use the nth root .of the generalized 
variance, which would be the geometric mean of the variances of the n variates 
multiplied by the nth root of the determinant of correlations among the n variables. 

k 

In this case the Ath moments can be found from (10) by substituting - for k. The 

distribution of the square root of the generalized variance for bivariate samples can 
be found from (12 b) by setting £=t®, thus obtaining 


9(t)’ 


2 




....(12c). 


Y(N- 2) 

Again, it may be important in certain situations to take the 2nth root of the 
generalized variance, which would be the geometric mean of the standard deviations 
of the n variates multiplied by the 2nth root of the determinant of correlations. In 

k 

this case the moments can be found by substituting ^ for k 

4. Moments and Distribution of the Ratio of independent generalized Variances . 
The statistical coefficient to be considered here is a generalization of the ratio of 
two independently distributed variances whose distribution in samples of a single 
variate is used extensively by Fisher* in his analysis of variance. 

Let us denote by £ and rj the generalized variances in two samples from n-variate 
populations in which the generalized variances are a and 0 respectively. If we let 

- as y/r, then since £ and rj are independent, the Ath moment of yfr can be deduced 
V 

from (10) as 


M k M = M k (!)M k ( v ) = (?f h 


T ^' +k) ~w k > 


M» 


N« 


r (T^(T) J 


...(13), 


where A * ^, B —, M and N being the numbers of individuals in the samples. 

The distribution of yfr can be readily derived from the distributions of £ and 17, . 
using the form given by (1). Accordingly, we have as the joint distribution of 
£ and y, 


M-n W-u M-n-t A T -*-2 


f® r® r® i_i 

v * I ... MO* M.)* -.(UU 

Jo Jo Jo 




KA 2 B * £ 






xe 14), 

B. A. Fiiher: Proceeding! of the International Mathematical Congreie, Vol. n. Toronto (19M). 




8. 8. Wilks 


479 


where 

K- 




7TH 


, dT» dtidtf ... dt n ~i, d& «= d8xd6 %... d0*_i. 


Making the transformation 


- = (» = 1 , 2 ,...«- 1 ), 
V -l — *i 


and integrating with respect to the d’s and 17 , we can write the distribution of ifr in 
the following form: 

F(f ). H f 1 f\.. f : \a (1 - z )(1 -*)(1 - s t ) ... (1 -s„-i) + Bzs l8l ... 

JO JO Jo 


A+l 


n+2 


2»-l 


x[*l(l-*i)] *“[#*(1 — **)] ” * ...Ml-Uf * da\d8%... ds„_i 

.(15), 


where 

'iir^ * 

*«o \ Z / 


•-i /«-*> *- 


Jf -*-2 
yfr ~ * ' 


, 3/ + JV *—2 

d=-x-, * = 


( 1 +*)<*-**’ " 2 l+* # 

Without loss of generality, we can assume that 2 ? < 3. For, if 3 < B we can make 

yfr 


the transformation = 1 — #*•, z =* 1 — i, where £ = 


1 +* 


, and get the desired form* 


If we denote by e , then we can write 

^(M + N- 1 — 

1 l 9 J JV r —« 3f-n~ 


r VT'') r (' f 1 ). 


2 7 y~* M-n-2 ( M+n \ 

TTTw* 2 * 8 0 + *)“' 2 




(1 -«,).. .(! - * n -i) 
1 + ^ 


1 - ( 1 -a) ( 1 -».)•■• ( 1 !a Y^ ri J f 




Jf+tf- 2 »-l 


jf+jy- n-a jir+iv-H-4 _ 

x[«x(l-«i)] ~ r [»>(!-».)] * ...[s^iCl-«»_!)] * 

From the expression in the brace in the integrand, (we have 

1 -(1 - «l)(1 (1 - *n-l) - e8l8 *+ ^"~ X 


< 2 *i dif- dSn-i 
...(16). 


(1 -1) (1 ~ *>) • ••( j~ fn—l) 


and 


’<1 


1 +t 

(!-*)(! 

1 + t 

for and >0. Hence, the quantity in the brace can be expanded into 

a double infinite series which is uniformly convergent in the s’s in the region of 
integration, and the integrations can be performed term by term. 






480 Certain Generalizations in the Analysis of Variance 

For practical purposes, however, we can find the distribution of yfr for n «2 by 
means of (12 b). Indeed, the joint distribution of £ and rj for this case is 


2 A 


M-2 JV—2 M- 4 JV—4 

i 8 ft a £ • * 




Setting -: 
V 


r(M-2)r(N-2) 

! ^ and integrating with respect to 17 , we get 

JfcT-2 JV -8 Jf —4 

rcar+tf -*)^ 1 


.(17). 


*<*)" 


V 


..(18). 


2^-2)^-2) (V^ + 

If the two samples are from populations with a = / 8 , then F(^) will only involve 
M t N and yfr. The condition a**/3, however, does not imply that the standard 
deviations and the correlations of the two variates for the two populations are 
identical. 

The foregoing analysis can be extended to two samples drawn from populations 
with different numbers of variates. The moments of the ratio of the two generalized 
variances can be readily inferred from (13). To consider the simplest case of this 
kind, let f be the variance in a sample of one variate and rj the generalized variance 

in a sample of two variates. It is reasonable to use as our statistical coefficient the 
jfc . t 

ratio -~p = 0 , instead of - . The distribution of 6 is 
\7J V 


/(*)-■ 


2 djT 


N-2 

B 2 
"2 


^ M + 2N-5 y^ 


r(^)r(N-2)(A 1 e+2^B t ) 




..(19). 


5. Ratio of independent Generalized Variance to any of its principal Minors . 
Here we shall consider the moments and distribution of the ratio of |o^| in (7) to 
any one of its principal minors of £th order. Without loss of generality, we can 
take as our minor the one standing in the upper left corner of |a^|, because any 
principal minor can be shifted to that position by proper interchange of rows and 
columns in the determinant accompanied by similar changes in A to maintain the 
usual correspondence between the statistical coefficients and population parameters. 
Denote the ratio 

(i,j ml, 2 ,... n; p,q = 1 , 2 ,... f, t<n), 

by <f>. The &th moment of tf> is 

JV-l 


**<*>- 


*(ft 


A 2 

n r 

i -1 


(^0 


-J 


- 2 Andy , 

e I 


y-n-2 


a a\ 


+k 




| On I da .. .(20). 


We remark that the result of integrating (7) with respect to a* |fl (i« 1 , 2 ,... n) is 
to reduce it toti^jtfgpbution of the variances and covariances of the first n - 1 
variates of sample. If we integrate further with respect to 

n—1 1 0 “ 1> 2, ... W -- 1), 







S. 8. Wilks 


481 


we reduce it to the case of an (n — 2)-variate population. If the process be performed 
n — t times, the distribution is reduced to the case of a J-variate population. By 
argument similar to that used in deducing (10) from (9) we find 


f - 1 Aifaq , |I i —'k j 

Je u-' |a#| 2 \apg\ da n „ t 


2 


'+* n(w—1) t(t- 1) 


'LZ±1 n /N — i \ - S 

t*<+i \ * / 




where the integration is performed with respect to all variables except those 

NA (t) 

contained in | a vq | (jP» ? 33 2, • • • A ( *> is the principal minor of 

order t in the upper left corner of A and A pq it} is the co-factor corresponding to ppq 
in A li) . If (21) be integrated with respect to the variables dpg (p, q=* 1,2,... t) and 
the result multiplied by 


we obtain 


where 


w(»- 1 ) * / fJ i\ 

’ * r (V) 

ft r(t- 4 *) 

t = M-l \ A / 

B ’ 2 n_ *... cr„*A 


Therefore, from (2) the distribution /(<£) is 


A’—n iV—w 


B * d> 2 rooroo roo 

11 -L* 

< = 7+1 \ 2 j 


x e ei ri *»-*-* «*-* l dvidvt...dv n -(-i 


When t = to — 1 , we have the distribution of the variance of the difference between 
the nth variate and its estimate from the regression “plane” of the remaining 
n — 1 variates, that is 


W-n 

B l * * 


-1 ~Bi4> 

e 


.(22 a), 


jy * * 

where B% = #—^^ and R is the multiple correlation coefficient between the 
2cr w *(l — H ) 

nth variate and the first n — 1 variates. 






482 Certain Generalizations in the Analysis of Variance 


For £ = » — 2 
where 


„ , N-n 

A-k-1 s 


.W-* 

2 


~i -Wife* 
6 


(22 H 


B g = 


N*&<*-*! 


4ff * n _x <r n * A 


6 . Oeneralitation of the Correlation Ratio. Let p samples w, (/8 — 1 , 2 , ...p) of 
iV* items respectively be drawn from a normal population of one variable and let 
xp and 8* be the mean and variance of a>,. Let ii be the sample formed by pooling 
the ®’s and let its mean and variance be denoted by % and /ST*. The statistical 
coefficient i), defined as 


t N f (x„-Xf 

■ S-i __ 

NS* 


If/'-*) 


•(23), 


is known as the correlation ratio with the samples top forming the p categories of 
the independent variable. The distribution of rf defined in this manner was first 
found by Fisher* by his analysis of variance, and later by Hotellingf by a different 
method. 

In this section we shall generalize the above definition of rp for samples from 
an n-variate normal population, and find its moments and distribution. We note 
from (23) that rj 2 is the ratio of the weighted variance of the means of the p sub¬ 
samples cop to the variance of O. Now, let us suppose p samples cop' (/8 = 1,2,... p) 
of rip items respectively to be drawn from an n-variate normal population. Let the 

sample formed by pooling the g>"s be ft', which will have % np-N items. The 

statistical coefficient to be considered is the ratio of the generalized weighted variance 
of the means of the a/ 's to the generalized variance of ft'. That is 


/ 7 _ I h 

U ~\H 


where 


and 


fyfc* tyi = y \ np (Sip - Si) (Sjp - Sj), 




2 2 (Xip a — Xi)(Xjp a -Sj\ 


N 0t= i 

where X# is the mean of the ith variate in the /9th sample and is the value of 
the ath individual for the ith variate in the /9th sample af. We observe that tty 
can be written as by + Cij, where 


or setting 


c «i — jy ^ 2 (*</»« - £&) (#/0* - .(24), 

XT /3*= 1 aesl 

^ (*#• - r<)j) (xjfc - == rifi Vij0, 


* E. A. Either: Proceedings of the International Mathematical Congress , Vol. u. Toronto (1934). 
t H. Hotelling: “The Distribution of Correlation Batioe oalonlated from random Data,” Proceedings 
of the National Academy of Sciences, Vol. u. No. 10 (1925), pp. 657—662. 






S. 8. Wilks 


488 


we have 


1 i 


(t,j = l,2,...n).(24o). 


Clearly vqp ie the general element in the determipant of variances and covariances 
in the sample ©/. The moments of |o#| can be written down at once from (10) as 

2 


a -* n 


«•(¥’+*) 


W) 


It is well known that the system of means is distributed independently of the 
system of variances and covariances in samples from an n-variate normal population. 
Therefore, the system {&#}* is distributed independently of the set {c#}. The 
means in a sample tn N individuals are distributed according to 


A * - 




2 Aii(Xi-mi){Xi-i*j) 


dl, 


.(25), 


t r 

where Ay is given in (7) and rm is the mean of in the population, which, except 
for a factor N in each Ay, is the distribution of the parent population. Therefore, 
the distribution of the set of statistical coefficients {by} can be deduced from (7), 
for p > n, as 

p-i 

- 5 E 3 T 7 . m 

The distribution of the variances and covariances \vy { 9 } in (24 a) is given by (7) 
with ^replaced by n*. Hence, it can be shown without much difficulty that the 
distribution of the set {c#} is given by 

JV-p 

A 2 -2 Anon , .-=- 

— e <,/= i I * 

»(*-!! , 


7 r 




N-p-n-1 

'\cv\ • dc 


■m 


One way to prove (27) is to evaluate the characteristic function of the set {<?#} from 
the distributions of the quantities {v^}, 08 = 1,2,... p). Indeed, we define the 
characteristic function <f> (a) of [cy\ as 


*«>- 


pn(n-l) 

r a 


4 — II_ 

-i\J 


h n rf^Y 

1=1 i-l \ * / 


p H-y* 

x fi l^l -5- dV ...(28), 

fiml 


where dV is the product of the differentials of all the v’s, 5 is the set (cty}, 8# is the 

* The brace notation, { }, will be used to indicate a system or aggregate, as distinot from the 
notation | | used to indicate a determinant. 






484 Certain Generalizations in the Analysis of 


toe 


Kronecker delta which is unity for % and zero for i =fcj, Ay w *« ^ Ay and aysaji. 

This integral breaks up into p integrals, each of the form ( 8 ). Applying the results 
of ( 8 ) on each of the integrals, we get 

x-r> ay * 

^(5) = | Ay | 2 Ay 2-Sn 


.(29). 


This is clearly the characteristic function belonging to (27). Hence (27) is the 
distribution of the system {cy}. We are now in a position to state that 


- S Ay(bij+cu) 


J.V- 


Kl’lfci 


p-W-it JIT —y —W —A 


\di\ 


dbdc '■ 


H 11 
ri |t= i 


-h) 


where 


j 


N—\ 

A 2 


.(30), 


w(n-l) n 
rr 2 

i 




which is the product of the constant coefficients of the distributions of the two sets 
of statistical coefficients { 6 y} and {c#}. If in (30) we set k = — h,p=sp + 2h and 
N = N+2h , afterwards multiplying by H, we have for the Ath moment of JJ 


M h (U)= n 


, ( £ i'O r ( e r + *)' 




The distribution <f>(U) of U satisfies an integral equation of type (B) and from (5) 
we have 

n / N — i\ P T n ~J n(N~p) 

n r(~A) u 2 (i -u) » 






i ji ri N ~p -1 

... (ViVt ... 2 

0 Jo J 0 


(H— 1 )(aV-JO> l 


( n~2)(N~p ) 


x(i-m) 2 ( 1 -t,) 2 ...(i-« B _i) 2 

x[l-{v 1 + v i .(\-v 1 )}(l-U )]~(~^ J )... 


N-P 


- 1 \ 


J) 

• x [1 - |i»i 4* v 2 . (1 — Vi) 4-... + tfn-i«(I “ Vi) ... (1 — fl n - 2 )} (1 — v/ ;J ' 

x dvidvt ... dv n -1 .(31). 

The range of U is from 0 to 1, since B = 1 in (5) for this case. 

For n »1 we get 

\ 2 ) 


MV)- 




r ( y -i i ) r ( £ F) 

the well-knoYrn jm^^tained by FistA 1 and Hotelling 


^s(31a), 
tion of i 7 2 






8. 8. Wilks 


485 


For n « 2 


* 00 - 


r rr^) r (t ) ^ (1 - U)N ~ P _1 

r ( e i J ) r ( p T 2 ) r, (V) 

fl ^-P 1 _ 

X JMl-t*)] * [l-^l-ET)] V * idv! 


*<&)*&)**-* ( 2 ' 2 ' : w 


7. Generalization of 1 — t/*. In the case of samples of one variable the distribu- 
bution of 1 — rf can be found from that of if by a simple change of variable, but 
such is not true in the generalization which we shall consider. 

£ np8p* 

From (23) we find that 1 — rf*= .(32), 

that is, the ratio of the weighted mean of the variances of the samples top to the 
variance of fl. The quantity which we shall consider as a generalization of 1 — rf is 

H 2 '. ,m 

y and {ay} have been defined in Section 6 . 

be shown that W arises as the maximum likelihood criterion X H of the 
type used by Neyman and E. S. Pearson for testing an hypothesis H that p samples 
top are from a subclass d of a class D of admissible populations. Jn the present 
case D is Kie class of all sets of p n-variate normal populations in each set of which 
the corresponding variances and covariances are the same, but the means are 
completely independent, while d is the subclass in each set of which the means 
are the same, that is ^ say in each set of which the p populations are identical. 
The maximum of the Z-function of the samples top (/3= 1, 2 , ... p) for the popula¬ 
tions of class D is 

JS p n p-n-Z 

M d = J\Cij\ 2 n \vyp\ 8 . 

0 = 1 

The maximum of the Z-function of the samples from populations of class d is 

p np— n—2 

M d = j Kf* ft M 2 . 

0=1 • 

where J is a constant depending on np (/9 = 1 , 2 , ... p) and n. Therefore 

M ¥ 


Biometrika xxiv 


31 






486 Certain Generalizations in the Analysis of Variance 


The A th moment of IT, when H is true, is found from (80) by setting k** — h, 
N = N+2h, and then multiplying by 


■ir-i 

A 2 


W(n-1) n 


7r 




Accordingly, we get 

•rfr> rr^ )' 

The distribution of W is clearly of the form (5) and can be written as 

t*(p-D 




n / N— i\ S-JEL. . n * - „i "ViT zi .i 

n/rrj ? 2 (i-w) * 


to / 

II r(: 

»=i \ 


jr-p + i-A^/p-i 


) r '(v) 


ff... ]* (%*• -Vn-l) 
JoJo Jo 


p —2 
2 


(w-lHp-1) (n—-2) (p—1 ) f p-1 

x ( 1 ~ " 


-1 ... 


(l-*> 2 _ 2 2 [l-i*(l-F)J 2 

x[l-{t* + r».(l-fli)}(l- TPXf~... 

P -2 

x [1 — {yi+ ^.(l — vi)+ ... -f r n -i.(I — vi).(l -i> a )... (1 — ^n- 2 )} (1 ~ W)] 2 

x dv!dv 2 ... dv n ~1 .( 35 ). 

The range of W is from 0 to 1 . 

For n = 1 we find 

'N- 1 > 


0l(W): 

the distribution of 1 — rj 2 . 
For n = 2 

(N -1 


•(t) 


A T -p-2 


/)-3 


r(^)r(£f‘) 


-— r F 2 (1-TF) 2 

p-l\ 


.(35 a), 




*(H 0 '- 7Tt - 

r N ~P 




W 2 (1 - IF) 


P-2 


r(^) r (^ii) r ( P _ i ) 

x ^ , £Z_ X , p _ 1,1 - Tf].(35 6 ). 

At this point we remark that since the elements {&#} are distributed inde¬ 
pendently of the quantities {cy} and since the distribution of each system is 
essentially of the form (7), we can deduce from (13) that the Arth moment of the 

“ r(e=i+ t )r 

” r ( £ f’) r ( Z ^¥ IE5 [ 

and its distribution can be found from (16). 






S. 8. Wilks 487 

8 . Generalisation of “ Student’s" Ratio. For samples of a single variable 
"Student’s” ratio is defined as the quantity 

(% — m) 

£s=z- -- , 

8 

where 8 is the standard deviation, 3 the mean of the sample and m the population 
mean. In a recent paper, Hotelling* has generalized the statistical coefficient & 
for samples from an n-variate normal population and has found the distribution 
of I* which is the product of the generalized s® and the number of degrees of 
freedom in the sample. However, we shall show that the distribution of this 
generalized ratio can be reached by the methods used in Sections 6 and 7. 

In a sample of N ^ividuals from an n-variate normal population, let the set 
of variances and covariances {ayj be defined as in Section 3. Let the sample means 
be {and the corresponding population means be {mi}. The distribution of the 
set {a#} is given by (7) and that of {X<} by (25). The statistical coefficient which we 
shall consider first is 


1*1 V * 

where ey — ciij + (— mi) (Xj — mf). 

It is not difficult to show that Y arises as the maximum likelihood criterion 
\ H ' for testing the hypothesis H f that our sample is from a subset d! of n-variate 
populations D\ where d' is the class of normal populations having a specified set 
of means {m<} and any set of variances and covariances, and D' is the class of all 
n-variate normal populations with any set of means, variances and covariances. 
Proceeding as in Section 6 for \ Hi we find 

v "[teO* 

By the procedure used in finding the distribution of {c#} in (27), *ve can show 
that the distribution of the set {e if } is 

N 

At £ j fl-n -1 

d. % Aifty II « t /rth* \ 


} n / 

4 nr ■ 

»«i \ 


N+l-i\ 


JV-w-l 

I * de 


Therefore, the Ath moment of | e # | is the same as that of £ in (10) with N 
replaced by N +1. Since the means {% { } and the system {a if } are independently 
distributed, we shall have 


<T, A I €if |* I dij 


I 8 dadl 


A -Ar—» - 

* A * 7T 


r (^‘)r(^ 
i: (^ 


H. Hotelling : AnnaU of Mathematical Statutic*, Vol. n. (1981). 





488 Certain Generalizations in the Analysis of Variance 


Changing N to N + 2h and k to - h and multiplying by 

N 

A* 


n(n-l)n n /AT A 

' * \?, r (V) 

r (i+*) r (V) 


wo get 


M k (Y) = 


Hence, from (5), the distribution of Y is 

„(N\ 




S~n n__ 

Y 2 (1 -F/ dF. 


with a range from 0 to 1. 

By breaking up the rows or columns of | and expressing |e l7 | as a sum of 
the resulting determinants, it can be readily shown that 

1 


where 


1 — 1 rp 2 i 

1 + W~-l 

- 2 M(X p -m p )(X v -m g ), 

p, q = 1 | Vi) | 


and a m is the co-factor of a pq in | a# |. 

Making the change of variable from Y to T in (40), we find the distribution 
of T to be 

2 r (^) r-*ir 

which is the distribution established by Hotelling. 

We note that (41) has been derived without making use of the property of the 
invariance of T 2 under all homogeneous linear transformations of the n variates in 
the population. This property, however, played an important part in Hotelling’s 
derivation. 


9. Generalization of the \-Criterion appropriate to k Samples. In 1931, E. S. 
Pearson and Neyman* considered a certain maximum likelihood criterion \ H for 
testing the hypothesis H that k samples are drawn from a subclass o> of a class XI 
of admissible populations, where XI is the clews of all sets of k univariate normal 

* J. Neyman and E. S. Pearson: Bulletin de VAcmUmie Polonaise dee Sciences et dee Lettres , SSrie A , 
Sciences malh/matiques , 1930 and 1981. 

(») °* ^ifl Section and the \jj of Section 7 are respectively generalizations for the case 

of n variables of the Xjy, X/^ and Xjy # used by these writers in the case of a single variable. 





S. S. Wilks 


489 


populations and o> is the subclass in each set of which the k populations have the 
same means and standard deviations. This criterion, for the case of k samples of 
one variable, was found in the following manner: 

The maximum of the Z-function of all k samples from populations of class ft is 

JV k * 0 ~» 

0=1 

where C is a constant depending on the rip a (/9 = 1, 2the numbers of indi- 

k 

viduals in the samples, and N « 2 rip , Sp* is the variance of the /8th sample and fi» 0 2 

0 =i 

the variance of the pool of the k samples. 

The maximum of tne joint likelihood of the k samples from populations of class 
o) is 

k *0 k * 0—3 

M„ = cn n (V)“- 

0=1 0=1 

For the ratio of these maxi mums we have 


* X /V\T 


which is the criterion adopted by E. S. Pearson and Neyman for testing H. 

The generalization of this criterion for testing hypothesis H on k samples from an 
w-variate normal population is straightforward. Indeed, the generalized criterion is 


where 1 8^p | is the generalized variance of the £th sample and | Syo\ is the 
generalized variance of the sample formed by combining the k samples. 

To find the moments of \ H (n) , when the hypothesis is, we proceed as in Sections 6 

and 7, and deduce at once from (10) the 2th moment of ) S,'jo\* N (^N = 2 n^j to be 

A y TT I \ " / /A o\ 


.m » 
a * n 
< = 1 




But, since the elements {$**>} are expressible in terms of the variances, covariances 
and means of the k samples under consideration, we must have 


r n k ... __ — u «p—n—2 Nt 

n | s ti „ |“i“ 10*1* dsdX 
J 0 = 1 


1-t n a,-?,**** n ” ( 2 ' 

- r(^ 





490 Certain Generalizations in the Analysis of Variance 


where Ap is the determinant A with N replaced by n fi and A^n » ^-4#. Setting 
up = n p (1 + h), N * N (1 + h) and t = - ——jr , and afterwards multiplying (44) by 


k 

IT 

3=1 


w 3 


_£** 

n(n~l),n n 

a ' « n 


]V~ « r * 

we get for the Ath moment of \ B (B) 




(*«<«)) =* H 

3*1 


For the case n =» 1 , we have 

hr k 
3*1 




fi 

Ch 

l_ 

N Y^~ » I \ 2 ) \ 

n 

n 

i-l 

\ 2 / 

v*3 ) i s i ^ p ^113 ^ 1 

r jit (l + h) - 


r^€.(. 1 _+*)- i y 

r 


.(45). 


r (-^) 

r ^(l+A)-iy 


which was found by E. S. Pearson and Neyman by direct integration. 

Let us modify the hypothesis which yielded the criterion \ fl(n) in (42), and 
suppose that fi> is the class of all sets of n-variate normal populations in each set 
of which the corresponding variances and covariances are the same, but the means 
are completely independent. Let the definition of fl be unchanged. Then clearly 
w is a subclass of XL Let the hypothesis that the k samples are from the subclass o> 
of populations X 2 be H ' (fl) . Then the E. S. Pearson and Neyman criterion A,jr<«) 
appropriate to H\ n) is readily found to be 


where 


1 k 

C V “ N ^ n fi 8 W- 

J.V 3=i 


The distribution of the set (c^} in this case is given by (27) with p replaced by k . 
Proceeding exactly as in the case of (n) , we find that the Ath moment of is 


nhn /p /**(! +*) — A> ^{N-k+l — l\ 

n '(*& n IV 2 ) 11 g I' r ( -1—) 

3»i W/ *=i p / n 3 ~i \ issl p / jy(l-)-A)-Ar+l- 


5 J 


For the case n = L we have 


.(4fi but). 


\ 


M h 



(n t (1 + A) - 


V 2 ) 

v ri 

[*r) 

1 


as found by E. S. Pearson and Neyman 





S. S. Wilks 


491 


10. Momenta and Distributions of Ratios of Determinants of Correlation Co¬ 
efficients. The distribution of the correlation coefficients r in samples from a normal 
population in which the correlation is zero was first suggested by “Student”* 
and later verified by Fisherf. From this distribution one can readily find that of 

• \ r 

1 — r 1 which can be written as the determinant - In this section we shall 

r 1 

first consider the moments and distribution of the determinant of the correlation 
coefficients in a sample from a normal population of n independent variables. 
That is, we shall find the moments and distribution of |ry|t, where and 

Ta — 1. The distribution of variances and covariances in a sample from such a 
population is given by (7), where the p’ s are all made zero. 

Hence, we have; corresponding to (8), 


f 


JV~»—2 


where 


K | „ a * s 4 [r (t 1 ) A -'r'] 

N 


A, 


.(46). 


i 2 a? * 

If we change the variables by the transformation 
then 


&ij ** ^tj V CtuCLjjt (i t j — 1, 2, ••• 71 , % zfz j) 

N—S N-n-% n(n— 1 ) n r /AT A N-l-\ 


f — 2 n r /AT A - 

Je 2 |r#| * dadr = tt * II 


.(47). 


That the set {ryj is distributed independently of the set {&£*] can be shown by 
evaluating the characteristic function of {a^j which is known to be 


n r 2=1 

4>(a)=n Ai > -a { ) * , 

**i L J 


since the a’s are variances in samples from independent populations. This character¬ 
istic function must also satisfy the relation 




5 


(and** • 


2V—3 

•T* 


N-n-J 

rij\ 2 dadr , 


where H is the quantity on the right side of (47). 
follows that 


Jm 


JV-n-2 


dr 


From StekloflTs theory it 


# “ Student “The Probable Error of a Correlation Coefficient,” Biometrika^ Vol. vi. (1908—1909), 

pp. 802—810. 

f R. A. Fisher: Biometrika , Vol. z. (1915). 

t Bee Ragnar Frisch: “Correlation and Scatter in statistical Variables,” Norditk Statistitk Tidskrift , 
Vol. vui. (1928), pp. 86—102. 

In this paper Frisch considers the significance of the determinant |r#| and its principal minors in 
the procedure of fitting linear regression equations to scatter diagrams with special reference to the 
mat ter o f detecting the presence of irrelevant or unoorrelated variables. He refers to the quantity 
•f 1 r iJ I M collective matter coefficient of the sample, but he does not oonaider the problem of 
finding its sampling moments and distribution. 






492 Certain Generalizations in the Analysis of Variance 


is necessarily a constant independent of the a’s. Thus, we can deduce from (47) 
that the £th moment of |r^j = to is 


Mt (<*)■■ 


r ftft- I ),?, r ft : M 


The distribution of to can therefore be written from (5) as 


.(48). 


/<•)- 


r »-i| 

ft 1 ) 

N-n 
1(0 2 

»(H-1) 

-a-.)'-,,,, , 

n 

n 

t=2 

■•ft'' 

n ,). 

o 




-1 


n(n—1) 3 n (w— 1) 5 n~8 

x (1 - Vi) * (1 (1 -t»„_,) 2 {1 - * (1 - «))-* 


n—2 


x[l — {Vj + flg (1 -t»l)) (1-w)]- 1 ... 
x [1 — {vi + i»»(l — Vi) +... + »„_,(1 — Vi )... (1 — «„_,)} (1 — <o)] 2 

x dvidvt ... dv n ~2 .(49). 

For n = 2 we have 

(N-1\ 


ft (®) = 


V 2 ) £=± 


Vwr(^) 

the well-known distribution of 1 — r 2 . 

For n — 3 

/o ^ 

/s(«) = 


® 2 (l-6))-*^fn. 


.(49a), 


r<|>r(£=?)r(* / --») 




.(491). 


At this point we shall introduce a slightly more general function of the r ’s and 
consider the ratio of o> to the product of k of its principal minors which are 
mutually exclusive. Without loss of generality, we can consider the k minors as 
placed corner to corner down the main diagonal of |r#|, such that each element in 
this diagonal (all equal to unity) is included in one of the minors. 


Thus, let 


.(50), 


n c 

/9=i 


where o>/i is the /8th principal minor from the upper left corner of o> and contains 

k 

the inter-correlations of variates and 2 « n. If we multiply and divide the 

0=1 

» 

quantity on the right in (50) by II an, we have Z expressed as the ratio of the 

t = l 








8. 8. Wilks 


493 


generalized variance to the product of k of its principal minors, all of which are 
mutually exclusive and are such that every element of the main diagonal of |a#| is 
contained in one of them. If (46) be integrated with respect to all of the o's not 
contained in this set of principal minors, which will be denoted collectively by 3, 
we must have 


» " 

[e *■ 


-lAiaut | 

Kl 


N-n-2 


da 


H (n~l ) n 

i 




k r PfilPfi-UPfi /A T-n\~] 

n w~r— nr Y 

J*al L ? a«l \ ■ / 


* 4 j. N-np—2 

e n \a m \~—*~d (a - a) .(51), 

0 = 1 


where |a^| is the /3th minor in [a#), and a — a is the set of a 1 s in (46) not 
contained in d. 

k 

If both sides of (51) be multiplied by n |om|~\ which is constant as far as the 

0=i 

set a is concerned, and if N be replaced by N + 2A, afterwards integrating with 
respect to a - a, then multiplying by 

N -1 

(^ 1^2 ««» An) 2 
n(n-l) n 


7T 


we obtain the expression for the Ath moment of Z t which is 

(N-cC 


k Pfi 

M h (Z)-n n 

/3 = 1 o = l 


S r (^) 

momen 

r (^“) 


l r (^F + ‘)J 


ri 

t=l 


*)’ 

r-f) j 


«•(**-*♦ 


.(52). 


This is clearly the Ath moment of a function satisfying an integral equation of 
type (B), and hence the distribution of Z is of the form (5), wheie Z ranges from 
0 to 1. An important case of (52) arises when £ = 2, pi « n -1, and />»==! which 
yields the Ath moment of 1 - ii 2 , where R is the multiple correlation coefficient 
in a sample from a population with its multiple correlation coefficient zero. 
That is 


M h (Z 1 ) 

From this we get the result 

MZi) 


r (V+‘) r m 


r Tf 


r (^) 


tf-n-2 


) r ( ? f-) 




n—1 


(i-^i) 2 


-1 


from which we can deduce the distribution of J2 2 , by the simple change of variable 
Z\ **1 — iJ 2 , a result originally obtained by Fisher. 





494 


Certain Generalizations in the Analysis of Variance 


The distribution of Z can be found for a slightly more general problem than 
the one we have just considered. Thus, suppose a sample of N items is drawn 
from each of k independent normal populations, where the /8th population has 
inter-correlated variates. Let be the generalized variance of the /8th sample 
and V the generalized variance of the k samples treated as one sample with 

k 

2^)0 «■ n 

variates. Then it can be shown in a straightforward manner by the foregoing 
method that the Ath moment of 


is identical with Mh (Z). 



IT V0 
0 = 1 


The use of Z as a criterion for testing the hypothesis that the k samples are 
from k independent normal populations can be interpreted as an extension of the 
use of 1 — JR* as a criterion for testing the hypothesis that a sample of N items of 
a single variable and a sample of N items of n — 1 variables are from independent 
populations. For example, the criterion Zg appropriate to testing the hypothesis 
that a sample of N items of two variables and one of N items of n — 2 variables 
are from independent populations, will have its Ath moment given by ( 52 ) for the 
special case /8 = 2, pi = 2, and n — 2, that is, 


M h (Z % ) = 


r 

(^) 

1 r 

fr 2 ) 

1 r i 

(*?♦*) 

|r( 

'N- 

< 


ri 

(**-■) 

,r ( 

(N - n + 1\ r (N - 1 

y, 2 ) l \ 2 

> 

4- A 

> 

) r l 

(^ +4 ) 


and hence, from ( 5 ), we find the distribution of Z 2 to be 




JV —«—2 


f% (Z2) ! 


r (—2 ±1 ) r (-7”) r( "- s) 




It can be shown that Z is the \-criterion appropriate to testing the hypothesis 
that the n variables in the population fall into k groups, in each of which the 
variates may be inter-correlated, but such that no variate in one group is correlated 
with any variate in another. 


The practical application of the criteria developed in this paper must be left 
for further discussion. 



MISCELLANEA, 


On a Method of proceeding from partial Cell Frequencies to 
Ordinates and to total Cell Frequencies in the case of a 
bivariate Frequency Surface. 

j By JACQUES CHAPELIN, D.So. 

It may happen that to define a bivariate population, the whole population in every cell is not 
observed, but only the population in a partial cell. 

For instance, in order to get a rough evaluation of the amount of wood in a forest, foresters 
used to divide the area of the forest into rectangular cells, and in each of the rectangles to 
measure only the volume of the trees falling inside a partial domain, sometimes a strip parallel 
to one of the sides of the rectangle, coaxial with it and with breadth one-tenth or one-twentieth 
of the other side. Calling <f> the (measured) volume corresponding to Buch a strip or partial cell, 
it is required to find ft plausible value for the volume / inside the rectangle or total cell from 
which, by addition, a plausible value for the volume of the forest may be deduced. An easy 
solution would be to use a simple rule of proportionality: the volume of wood in a total cell 
would be takeu equal to the volume in the cor res j>on ding partial cell multiplied by the ratio of 
the area of a total cell to a partial cell. This supposes that the ^-ordinate corresponding to the 
ideal density surface is satisfactorily represented by the 2 -ordinate of a hyperbolic paraboloid. 
A more refined solution would be to use Pearson’s interpolation surface of the fourth order*. 
The aim of this paper is to obtain the necessary formulae: in the general case, the value of / for 
a total rectangle R is a linear and homogeneous function of the values of the nine <£’s correspond¬ 
ing to this rectangle R and to the eight rectangles adjoining R, and this linear form is defined 
by the first line of Table III. 

To take another example, let us suppose we have a population of N livir g animals, classified 
into classes, according to a character X , to which corresponds a first variate x. We wish to 
study the lethal dose y of a drug, according to the character X. For reasons of economy, we do 
not want, either to spend too much of this drug or to kill the whole population. Then, we 
decide to try the drug on one-fifth of the population. We divide each class (x- •£, x + %) into 
five equal parts, and we experiment only on the animals belonging to the middle class. Thus, we 
are led to partial cells with breadth \ and height 1, giving a two-dimensional set of <£’s from 
which we shall have to deduce the /’s, in order to be able to build up the usual correlation 
Table. To that effect, we could also use the formulae defined by the Table III of this paper. 

We shall suppose that the basic net is a system of squares with unit sides and that the 
system of partial oells consists of rectangles concentric and coaxial with the basic square cells. 
We shall use the ordinary Pearsonian interpolation formula corresponding to the mid-panel 
central difference formula up to and including second order differences. The sides of any of the 
rectangular cells will be a and 0, and we shall call <j > 01 , ... the nine observed frequencies 

in nine partial rectangular cells, according to the usual Pearsonian scheme. If a«0=»l, the 
quantities reduce to the usual frequencies /<*. The problem is to find the nine total frequen¬ 
cies fa when the nine partial frequencies <f>& are known. 

* Cf. Biometrika , Voi. xvn. p. 812, or Tablet for Statiiticians and Biometricians , Part II, p. xiii. 



496 


Miscellanea 


Wo have immediately the following integrals: 



/*!+** *3 /•!+*« a 3 fl+k* a 3 

from wliich we deduce at once Table I leading to the expressions of the <£’a as functions of the 
ordinates *. From this table, we can write the equations 

^<^ 00 «4(12-a 2 )(I2--^)«oo+a 2 /3*% + ..., .... 

The resolution of this system of nine linear equations leads to Table II, from which we can write 
the equations 

57r»a/9%>=4 (12+a 2 ) (l2 + /9 2 )<£ 00 +a 2 /3 2 <£ii + ..., .... 

At last, eliminating the ordinates z& between these nine equations and Pearson’s formulae 
( Biometrika, Vol. XVII. p. 312, or Tables for Statisticians and Biomstricians , Part II, p. xiv, 
formulae (a) to (t)), we obtain the formulae defined by Table III. They would read 

700=4 (11 -fa 2 ) (11+ 0 2 ) <#>oo+(1 - a 2 ) (1 -/9“) <*> n +.. 

It should be noticed that we obtain Pearson’s formulae ( loc . cit.) by supposing that, in Table T, 

««*/3= 1, or that, in Table III, a and /3 tend to zero, and that tends to z Similarly, Table II 

ap 

should lead to the other set of Pearson’s formulae ( Biometrika y Vol. xvii. p. 313, or Tables for 
Statisticians and Biometricians , Part II, p. xiv, formulae (a') to («')), by putting a=/3 ! =l. As this 
is not so, these formulae should bo replaced by the following*: 

576zoo - 676 /qo +/n +/ - j +/- n +/_i _ i - 26 (/ 0i +/> _ 1 4*/io -f /- io)» 

57 ten = 4/00 + 529/ n +/L 1 .,-23 (/,-, +/-„)+46 (/ 01 +/ 10 ) - 2 (/ fl .,+Ai«), 

576*!. j = 4/oo + 529/!_ x +/_ a — 23 (/a +/_ i-1) + 46 (/ y _! +/io) ** 2 (/u 4*/-io)> 

576* _ a — 4 /Jjo + 529/11i +/i _ i — 23 (f n +/_ i -!) + 46 (f {)l +/_ w ) — 2 (/> _ j +/io)» 

576* - 1 ^i - 4/oo + 529/.!.! +/„ - 23 (/ x _, +/_ u )+ 46 (/ 0 _, +/_ 10 ) - 2 (/ 01 +/io), 

576*0! — 52/jo +/_1 -1 +/i -1 — 23 (/n+/_ a) + 598/>i — 26/o_i ~ 2 (/io+/-io)» 

576*o -1 — 52/oq +/j j +/_ a ~ 23 (/i _ i +/_ i - 1 ) + 598/o _ i - 26/>i — 2 (/m -f/_ io)» 

676* 10 =• 52/ w +/-. a +/-i _ i — 23 (f n +/ x „ i) + 59B/ 1U - 26/. in - 2 ( f 0l +/>_ j), 

576* .jo =* 52/00+/H +/i -1 — 23 (/_ a +/-1 - 1 ) 4* 598/. i« — 26/ 10 - 2 (/n +/> _ 1 ). 

Lastly, if the basic cells are rectangles with sides h, k, and if the partial cells are rectangles 
with sides h\ V, concentric and coaxial to the rectangles of the basic cells, the right-hand sides 
of the equations deduced from Table I should be multiplied by hk , and a and 0 defined by the 
relations A'=aA, #=/3£, the right-hand sides of the equations deduced from Table II should be 
divided by hk> and the right-hand sides of the equations deduced from Table III should remain 
unchanged. 


[I am extremely obliged to Dr Chapelin for his correction of my formulae. K. P.] 



TABLE I. 


T 

*4 

* ^ *i 

CM 

a 

N 

e Si^ ^ 

«iisass$« 

ii 1 +^ 

*77 U 

CM 

*? 

_ _ s 

Si Si SiSi + 
srSi+Si+C-SSiSi 

^aSai-SS 

&'a'»sj ■■ 

CM | I <M-i^ 

CM 


Si£ € +%. 

sT' + Si+Si <n n. Si Si 

1 2^2^-: 1 mc3 

i n b i 722 1 1 

■--'CM CM ' 

« 1 1 2^°* 

CM 

7 

N 

£ 

££l+ £ «l 
^+£ 2%+?fe_£ 

’ll 9 b CM + CM *b + 

•hJN + Is 

©5, 

s 

£ 

£ i^£ ^ 

£ + 2%i 2 

-* 

7 

hT 

S' 

y % ^ 

SiT.aSi+Si + T.Si 
•J,+ S £-T.S'V.S+"li 

1 2 + % w «2 

1 •<* 

CM^ 


£ 

+ ^£ £ S: 

$i2£+Bh.+Vl*$t 

+<£+**$> i* 
+S£ 2 

'f 

CM 

4 s 

f ll«« 

e*AMM| MM 
sr^ % li <m <n 20 
« ^ ^ **,.-<« °lf 

'•If ^ 

2 1 i i i 

-*• 

1 

& 




a 

■Qk 

S SiSi fif 

+ SiSiCCSiSi+s 

SM*?**^ 

5 1 •££■ 'Ij, 

1 <*"» | S 

CM 

<1 

_ £_ 

+«t'<rSiSiSiSi2 + 

amm^a 
a^ 11 • ■]» 

i«» Si 

CM 


fSifsifSSSiSi 

f's 1 ^ 11 

CM 

-£ 

Si_ ~ t-Si 
S^ fc^SiSSsiSi 

02^^gScn£cM^^ 

«:S 1 a 1 if 1 ' 

CM 

7 

T 

S^ £ £ 

SiSLi^aSiiSi^ 
n,%a 4 sr-ii a~o4. 

“o a 7 - B a 
i i 4. i i 

CM 

7 

■e- 

£ ^5;£ 5; 

Si4.Sia”74.SiSiT 

w~cm <n x-v ' cm 4r-«7 1 

o —' o N ’t a a --t 

% 7 2T* 2 

i 4 < < i 

CM^ 

♦ 

£ ££ 

%.7asi4.si4.1 , si 
*hl*r*b ^ 24 "« 

2 , ? n, % 2 
\ i i II 

2 

*5 

^£ £ % 
tk3L' 4c 3s. 4. $ 0 % 

«2n» *e 2 

4 i i i i 

CM 


£ 

+ StSic-s' 

S-SiSlSiSi£S^ + 

S5555££S3. 
a 53-55 

* 

to 

& 

<?<? ff J # J 





498 


Miscellanea 


On the Beta* of Quadrilateral Distributions. 

Bt OWEN L. DAVIES. 

The Pearson Type curves are generated by a differential equation of the form 

1 dy _ x+a 

y dx + .******.' '* 

Theso curves cover the whole range of distributions which are likely to be encountered in the 
field of practical statistics. There are, however, several possible distributions not included in (1) 
which, although rare in actual experience, have more than mere mathematical interest. Such, 

for example, is the trapezium or triangle and in fact all curves for which 5jj|~ is not continuous 
at all points. The most interesting among these are the trapezia and quadrilaterals with one 
point of discontinuity for , namely, figures of the type 



Triangular, rectangular and linear distributions are all particular cases of these. 
I. Distributions following a Trapezium Law. 



The equations of the lines DC and EB are respectively 




Miscellanea 


499 


Let Mi be the ith moment of the whole figure about the vertical through 0\ then 

y%&dw+c j^x t dxj + j y^dx 


Let 


then 


and in particular* 


b—r cos B t 
a=rsin 6, 
d=rp. 


X—costf sin fl 


cr* +1 rc°s' +, «-8in* + ^ , , » +1 l 

"“(j+IK^+S) |_ costf—sin 6 + (~) ? J* 


[(cos 0 + sin B)+p]*=N the total frequency. 


Now M,' — Np 9 \ where pi is the *th moment coefficient about 0 and, therefore, on simplification 
the first four moment coefficients become 

' ~ ( 1+coa flsin 0)-p 2 
^ **3 (cos0+sin0)+p * 

'mm 7 ! ( C0B B + Bin B)+p z 
^ “ 6 (costf+sin B)+p 9 

, r 3 (1 +cos 0sin fl-cos*0 sin 2 fl) -p 4 
*■"10 (cos 0+sin 0)+p * 

, r 4 (cos B +sin B) (1 --cos 2 fl sin 2 0)+p 5 
“ 15 (cos d + sin 6>)-4- p * 

Referring these moments to the mean, 

M» -Ib 7^7J5 [(i+ax - 2X»)+ 3p. +4p* (1 + X)+3^+ 

#*8= 270^)5 K*+ 6X “ 3X8 - 34X»)+9p« (1 + X - 6X3) +3p2(4 _x-29X») 

- 4V*X - p* (12+39X) - V» - 2p«], 

Sr 4 

l(l ' +X)S((1 + 2X - 15X8 > +I e f” < r +x)(i+x-ax5 

+p 2 (17 + 42X - 9X 2 - 62X 8 )+ 6p 3 f (5 + 6X - 6X 2 ) + 3 p 4 (12+24X+X 2 ) 

+6^4 (5+4X) +p 7 (17 + 26X)+6p 7 «+P s ]> 
where X * cos B sin B, 

€ *■ cos B *f* sin B~(l +2X)^. 

The first two 0’s, 0 lB «£L, 02 * —, 

Ms Ms* 

of distributions following a trapezium law will thus depend on two parameters X «* cos 0 sin 0 
and p. Now b^a, b and a both positive; consequently, X is always positive and less than or 
equal to Moreover, all distributions giving rise to different 0 ’s are covered if we take <2^6, Le. 

0^p£ (cos B - sin B)g 1. 

By allowing X and p to vary within the above limits, 0 j and 0 | will be seen to trace out an area 
of finite extent on the 0 i, 0s plane. The limits to this area may be found by investigating the 
0 !, 0 s lines of particular subcases. 



500 


Miscellanea, 


(i) cf«0, i.e. p"*0. 

These distributions will correspond to figures of the following type: 



Their moments and /3’s may be obtained by putting p*=0 in the general relations above. 
These give 

, r 1+X 


P2 = 


1 


18(1+2X) 

1 


(1+2X-2X 2 ), 
j- (2 + 6X — 3X a — 34X 8 ), 


whence 


'•- ! To<nW <I+x, ' <,+a -“‘ ) ' 

8 (2+6A-3A 2 -34A 3 )* 

01 " 100 (1+ 2A — 2A 2 ) 8 ’ 


„ 24 (1+X) 2 (1+2X- 6A 2 ) 

P ®“10 (1 + 2X - 2X*) a 


0<A<i- 


ft, ft of such distributions will trace out a line of finite extent on the ft, ft plane connecting 
the line point L to the rectangular point R. These are, in fact, limiting cases corresponding 
respectively to X<=0 («<=»0) and A=£ (a—b) (see Fig. 1). 


(ii) «=0, i.e. X~0. 

The distributions will now be triangular, corresponding to figures of the type 


c 



D O B 


Putting X=0 in the general relations, we have 

P’2 « [1 + 3p + 4p 2 +3 p 3 +p 4 ], 

'*» != 270^+Tp [2+9p+12p> _ 1 ^ 

=ayo^TT)* [ 1 + ®p +1 7p ,+ ^° p s + 3 V++I7p*+®/> T + p 8 ]- 




Miscellanea 


•05 10 15 -20 -25 -30 - 3 ! 


FIQURE 1 . 


«->. \ 

\ t \ 

\ 

20 - \ 





% X'- 

\ 


Pi-O 


s \ 

X \ 

N X 
's\ 

N v. ' 

v. V. \ 


A 


-DENOTES BIQUADRATIC. x 

-DENOTES LIMITS FOR TRAPESIA. 

-DENOTES 

-DENOTES THE PARAMETRIC LINE A-# 

FOR THE m OF FIQURE2. ° 


These may be simplified considerably by writing 

rpazR cos <£, 

r—#sin<£, y«cos<£sin <£, 
R 2 

whence M=^(^+y), 

Ms«^(l-2y)*(2+5y), 

2ft /I x. 
^*Q7n( 1+ V) » 


Biometrika xxiv 


82 





502 


MistceUanea 


giving 


_2 (l-_2 v )(2+6v) s 
P, “25' (1 +yf 

12 

ft—g 


ft has the same value for all triangular distributions, while ft varies between 0 and *32. 


The ft, ft line is, therefore, of finite extent, parallel to the ft axis and joins the line point to 
a point I on the ft axis. 1 corresponds to the symmetrical case, i.e. ati isosceles triangle. 


(iii) Symmetrical Case (b-a)*=d y i.e p = (cos 0 — sin 8). 

Substituting pen (cos 8 - sin 8) in the general relations, we have 

-^t(3~4X) + f.4 

/i4= n ^ [<8 — 16A — 1 8X- + 42X 3 - 9X 4 ) + 2p* (1 - X) (4 - 4X - 5X 2 )] 

= 4^3 [(19 - 44X + 18X») + (13 - 20X) f/ >], 
whence ft=0, 

fi (19-44X + 18X 2 ) + (13-20X)fp 
^ 10 (5 - 12X + 6X 2 ) + (3 - 4X) 'ip ’ 

ep = (cos‘ 2 0-~sin 2 0) = (l - 4X 2 )i O^X 

The substitution (1 - 4y‘ 2 )i — 2 ^ ^^ + €ft ~\ 

L(3-4X) + epJ 

will reduce ft to the simple expression 

When X=0, then y = 01 

and X-J, then y = 4 I 

Tho limits for y are, therefore, the same as those for X, namely (0, £)• 

ft thus varies between 1*8 and 2*4 and the ft, ft line is that part of the ft axis lying between 
the rectangular point R and the isosceles point /. 

Differentiating ft and ft for the general case, we find 



m =o, 

\ L P / p-k-0 

Aft\ 

\dX ) p — A — 0 


m »o, 

\vp / p-A-0 

CfA 

\ v\ 1 p-iA —0 

and 

(«x-p=«=2-4, 

Ol) A = p = 0 


The ft, ft area is, therefore, bounded below by the line ft=2*4. Moreover, ft never exceeds 
*32 and ft is never less than 1 *8. It is fairly evident, then, that the area traced out by ft and 
ft of distributions following a traj>ezium law is bounded by the ft, ft lines the following three 
subcases: 

(i) triangle, 

(ii) symmetrical trapezium, 

(iii) the quadrilateral formed by a rectangle and a right triangle (see Fig. 1). 



Miscellanea 


508 


II. Quadrilateral* with One Point of Discontinuity for dy\dx . 



The *th moment of the quadrilateral OBCD about 0 is given by 

^'=(7+,) (7+2) [ 6~o + “‘ +,p J’ 

and i/o' «= | [(6 + a) + «/>]. 

Write 6«rcos0, 

a*=rsin0, X=cos0sin0, 
then the first four moment coefficients about 0 become 


, r_ (14- X) + p sin 8 # _ 

t* 1 ~3 (cos 04-sin 0)4p sin 0’ 

, r 8 (cos 0 -f sin 0) 4-p si n 8 0 
^ 13 6 (cos 0+sin 0)+p sin 0 * 

, _ r 8 ( 1 4 X — 1 X 8 ) 4* p sin* 0 
~10 (coh 0+mi 0)+p sin 0 * 

, __ r* (cos 0 4 sinj0) (1 — X 8 ) 4 p sin 6 0 
15 (cos 0 4sind) 4 p sin 0 

Referring these moments to the centroid vertical, 

(A) H- isG+tfSHW [(1 +2X-2X*)+p {3X (1 - X)+sin* 0 (2—X)}+p*sin 4 6\ 

W - [(2+6X - 3X* - 34X»)+p {9X (1+3X - 7X»)+ sin* 6 (6+3X - 39X*)} 

+3p* {2 sin* 6 (1+3X) (1 - 2X) - X* (8X - 7)}+2p» sin* 6\ 

OjA 

^~27 6(«+psin ' 0)« [(I +X)S (1 + 2 * -6X*)+p ( 3X ( 2 + 2X - 6X8 - ® x9 > 

+sin* 6 (4+6X - 12X* - 5X*)} + 3p* {X* (1 - 6X»)+sin* 6 (2+4X - 7X* - 4X»)} 

+p* {X* (- 4+5X - 3X*)+sin* 6 (4+4X - 10X* - 7X*)}+p* sin* 6} 


82-2 



504 


Miscellanea 


Subcases. 

(i) rf*=0, i.e. p~ 0. 

The moments and fts reduce for thbse already found for the figure (p. 600). 



(ii) d— - c, i.e. p= — 1. 

This subcase corresponds to triangular distributions, and on substitution the three moments 
reduce to 

M2= i r 8 ( 1 ” X )i 

' t3= ^0 (1+2X)i(2 “ 6X)l 

'* 4 "S70 (1_X) * 

„ 2 (1+ 2X) (2 -5X) 2 

giving A “25- U-W~~ ’ 

1 2 

A-v 

li 

This represents the same line already found for triangular distributions. The X, however, has 
a different meaning. The figure is symmetrical when a = 26, i.e. cos 0*=2 sin B or X«|. For this 
value of X, ft = 0 as expected. For X«0 or £ the figure corresponds to a right triangle giving 
ft =**32, the coordinates of the line point. 

(iii) x “£* 

The distributions will now be linear and correspond to figures of the following type. 



Substituting X=J in relations (A), we have 

( 72 ) 6 + 6p + p 2 

M2_ 18 (p + 2)» ’ 

( 72 ) p(9+9p+p 2 ) 

Ms “ 270 (p + 2)» ’ 

2 ( 72 ) 27 + 54p + 39p s +12p>+p 4 
/i4 “ 270 ~ (p+2) 4 



Miscellanea 


505 


giving 


Put 


„ 32 .(»+9p+£*)» 

Pl ioo /> (6+6p+p s ) 3 ’ 

„ 12 27 + 54p + 39p 2 + 12p 3 + / 

/3s “ 6 ' (6 + 6p + p 2 )* 

(p+X)=^"^'and X'=cos<3 sin5, 


then 



* IS(1 + 7V)(1+V) 


0<X'^. 


(1+4X') 2 

Wo may give the following interpretation to V. Let 00**b, O'a—a and write 

&=rsin0, 

a~r cos By r 2 = a 2 + 6 2 , 

then X' will be equal to cos 6 sin 6. 


The curve* connecting ft and ft is 


( 2 ). 


It passes through the points 

ft = -32, ft =2*4, 
ft -Op ft-1-8, 

which are, respectively, the coordinates of the line point L and the rectangular point R. 

Clearly, ft must be less than or equal to 2-4, and accordingly ft must be less than or equal 
to -32. 


The line (2) may be readily plotted from the parametric equations by allowing X' to vary 
between 0 and This line is of finite extent connecting the line point and the rectangular point, 
and lies entirely within the biquadratic loop which is of fundamental importance in connection 
with Pearson's Type curves. It is of interest, therefore, to determine how closely a Type I 
curve 

'-»KT('-sT.» 

will fit a linear distribution. 


The best fit is obtained by identifying the first two moments and the range in both cases. 
For (3) we havet 

, m\ 

tr2 = my m2 m 1 ' = w 1 +11 

(m/+w 2 ') 2 (m l '+m 2 ' +1) m 2 t=m 2 + 1J * 

Hence («,' + m +1), 



and, therefore, 

niy = -f 1 —— ~Pi . 

.(4), 


sn»'“*n,+ l- y 1 ) (»»i + l). 

.(6). 


* Due to K. Pearson, 
t K. Pearson, Phil. Trans., Yol. 186, p. : 









506 


Miscellanea 


Since — - and a t +a 2 »total range R , we may readily calculate the constants m t , i»„ 

7tl 2 

a x and 03 . Moreover, the area of the whole curve is equal to the total frequency N. This 
enables us to find the remaining constant ,v 0 . 


Example. X' = *25. 
We readily find 


R**r( cos0- sinfl-rx *707,1068, 
-596,2251, 



(s)* =5 ’ 074, ° 7407 ’ 

and the required Type I curve is found to be 

.">• 

How oloscly this curve fits the linear distribution may bo judged from Fig. 2 . 


y-1*476,6273 x'* 766273 

y-1-4.62,O82{l+ 1 . 36 J 0ag j' 341SO6 ( r;S gg^ 23 +l] O9, - 5 ° 0 


X- 3C-1*306,023 


QO\DRILAXER\L. 




FIQURE 2. 


Throughout the whole range of variation of X', the ft, ft line remains fairly close to the lower 
branch of the biquadratic. We may, therefore, reasonably expect a fairly good fit of the type 


y=*>( 1+ s)” 


(Pearson Type IX curve). 


Range x= - a to 0. 

Identifying the range and mean of the two distributions, we have 


/ m! + l \ p x ' 

W1 + 2; R' 


from which m x may be calculated. y 0 is found by equating the whole area to the total fre¬ 
quency y. 








Miscellanea 


607 


For X'-= *25, the curve is found to be 

(fy -1-476,6273 (1 +*/«)W>W3 ...(7), 

which is plotted together with (6) and the line in Fig. 2. It can be seen that (7) is almost as 
good a fit as the more general curve (6)*. 

Let us return now to the general distributions of the type : 



The first two fts were found to depend on two parameters X and p. The ft**, therefore, trace 
t an area on the ft, ft plane. 

- (i) When the point D varies between 0 and C , i.e. - 1 and O^X^i, the ft, ft area is 

mtical with the one mapped out by the first two jS’s of trapezia. 

(ii) When D lies between C and L> i.e. . and 0<X^$, the area traced out 

cos a — sm p 

by the fts is that part of the plane bounded by the ft, ft lines of 

(a) linear distributions, 

( b) distributions represented by quadrilaterals formed by placing together a rectangle and 
a right triaugle. 

(iii) When D lies beyond L , i.e. the ft, ft area is bounded by a loop which 

is the envelope of tho lines X=constant for £ ^ X ^ #, p ^ t,. 

When 0^X<jj, the ft, ft area is bounded by the loop X = $, p^l, i.e. 

8 (218 + 61 Op + 294p a + 2p 3 ) 2 
100 (37+26p+p 2 ) 3 

_ 12 1226+ 1780p + 894p 3 + 196p 3 +p 4 
5 ' (37 + 26p+p 2 ) 2 

III. Convex Curvet. 


The first two /3’s of the above quadrilateral distributions occupy a relatively small portion 
of the ft, ft plane and those which are convex at all points fall within the area bounded by 
the lines (i) ft „ 0 . 

>2-4. 


(ii) ft- 

< l “>( 1 -&)-( 1 ^) (*+*</>-«)• 


Denote this region by C. 

* Fig. 2 provides an interesting example of the extent to which equality in the first four moments 
leads to correspondence in form; 






608 . 


Miscellanea 


Convex curves are necessarily of limited range and, therefore, of Pearson’s curves, those 
which are convex at all points must have the form 

(iv) y-yo** (!-*)*• 

Differentiating twice 

This is negative throughout the whole range provided a +1, neither 8 nor t being negative. 
This is, therefore, the condition for convexity. 

When $ or t is sero, the 0’s of (iv) lie on the lower branch of the biquadratic, and when 
the curve is symmetrical, giving £i=0. Furthermore, when * + 1, we have 


»-S (?-*)• 

—!<?-*) 


Eliminating we have 


*-(8-0 (1+0- 


(v) 4ft-6ft-8=0, 

3 «0] 

which is a straight line passing through the points \ and L. 

ft ~ 


It is clear then that the region of convexity for the Pearson curves is bounded by 

(a) the lower branch of the biquadratic, 

( b ) the 0 2 axis, 


(c) 40 2 -6ft-8=0. 

This region lies entirely within (7. 


2\cr/ j 8 


The normal curve y=y«c 2 ^ a/ is convex l>etween 
#th moment about the vertical through the origin and 


its points of inflexion. Let M % denote the 
± aa the points of inflexion, then 





=yo^ +1 r a «(6+i). 


Hence 

The points of inflexion are given by 


M 2if _ rg*(*+j) 

v 2 * r*Q) * 

g=0, i.o.a=i. Hence 


* r *( 25 > 

a* r*('5) ’ a* r,('8) ’ 


which give ft= 1'941, ft=0. 

r*(^ 

For one-half of the curve ■■'—• —— 


from which we deduce 
Both these points lie within 0. 


ft* L872 \ 
ft= -0273J 



Miscellanea 


509 


For the quadrant of the ellipse 


y-y«(l—**)* <P<*<1), 

r 1 

Mi (about the origin)«y 0 1(1-#*)* 

r('-±i) • 

""to ,(4-*)- 


from which we deduce 


ft = 1*97721 
ft- *0554/ 


The trigonometrical curve y~2/o cos # ^ ^ ^ 

is convex. If Mi denotes the *th moment, about the origin, of one-half of the curve, 

n 

/: cos x & dx — i/o' /*#', 


After reduction we find 


Jfo-yo* 


'■•-(*)■-« © + *’ 

-(I-)- 


whence ft = 2*2317, 

ft- *1797. 

For the complete symmetrical curve we find 

ft = 2*1938, 
ft = 0. 

Both these points fall in the region C. 

Finally, the curve y—yo (1 — s"*) (a? > 0) 

is convex at all points. Tf we take the range (0, 1), we have 

Mi (about the origin)—y 0 j (1 - «"*) x*dx 

-»>{C 4 

Evaluating the integrals and referring the moments to the centroid vertical, we have 

*058,830, 

*,= -•006,427, 

M4= *007,716, 

giving ft—2-2295, 

ft-0-2029, 


which lies within (7. 



510 


Miscellanea 


Consequently, for all convex curves considered above, the first (wo fft fall in tire region 0. 
I cannot conceive of a convex curve which does not give this result, and it seems quite probable 
that the 0’s of all convex curves fall in the region C+. 



2*SH 


-DENOTES LIMITS TO CONVEX PEARSON CURVES. 

-DENOTES SUGGESTED UNITS TO ALL CONVEX 


BIQUADRATIC { 


UPPER BRANCH.-? 

LOWER BRANCH.-- 




[* A direct proof that all convex frequency curves must lie in the area C would be of considerable 
interest. Ed.] 


CAMBRIDGE : PRINTED BY WALTER LEWIS, M.A., AT THE UNIVERSITY PRESS 






